Paperless Home
Goal: getting rid of incoming paper asap, continueing with electronic documents
Here’s what I learned when setting up my Paperless Home. For this project I have ordered a Fujitsu Scanner (click for more technical details!).
Software: Scansnap Manager 5.0L24 (Windows XP)
Goal (high WAF)
- Open scanner (which then auto-connects to the scanning SW)
- Put in a piece of paper and hit the scan button.
- Close scanner and put the paper in a big box.
Ideal would be if..
- Tagging and renaming PDF files based on content (some of this could be done by A-pdf renamer)
- KPN phone bill: kpn, 2011, march, bill, phone
folder: \phone\kpn\bills\2011 - American Express letter: americanexpress, 2011, january, other
folder: \bank\creditcard\amex\other\2011 - Bank statement DJ: ingbank, 2011, march, statement, DJ
folder: \bank\creditcard\amex\statement\2011 - Bank statement Wife: ingbank, 2011, march, statement, Wife
folder: \bank\ing\statement\2011
- KPN phone bill: kpn, 2011, march, bill, phone
- Store files based on tags (not mandatory! See examples above)
Software
This is the software I ran into when researching this project.
- Irislink ReadIris Corporate ($471, Windows)
does not store files in location based on content and is quite expensive - A-PDF Rename ($27 Windows)
very interesting, renames PDFs based on keywords or text location. Not automated
That means it can use keywords to rename a file but you have to select them, every time. Neatworks scanners & software
Note: software only works with own scanner (Win) or limited number of scanners (Mac)- OpenKM document management system (Open Source)
- Foxit PDF (products changed?) (free, Windows) PDF file manager
Has potential. This might be the first application that I’m going to test-drive. - DevonThink ($80, Pro:$150, Mac only) Interesting, automatically recognize/organize documents
Cannot rename automatically or recognize documents based on the contents
TIP: how to create a “watch folder“ - EagleFiler (€ 32, OSX) manage files, assign tags and provide smart filters (interesting!)
- iDocument ($ 49, OSX) manage documents, assign tags, automate, batch and more
- Yojimbo ($ 35, OSX) manage, tag and find documents
Connect scanner
Using the Belkin LAN USB hub. Check out the Scanner post for more information. The result is that I can place the scanner anywhere in the house because it connects to it’s software over WiFi.
Important
- Hardware: Get a good scanner, as small as possible, low power (or good stand-by power consumption)
- Software: Once you have scanned a piece of paper, it needs to be organized in a way that you can find it.
- Security: With all of your mail stored as files, make sure only you have access to these files
- Backup: When you collection grows it becomes more and more important to have a good backup. Check-out online backup services like Backblaze and others.
- Accessibility: With all of this information stored, your wife should be able to find a document. Browser? Folder structure? Search? iPad/iPhone?
Default Software
The scanner comes with the Scansnap Manager. This software is needed to control your scanner as it isn’t TWAIN compliant. This software is not too bad.
More info on the Scansnap software is available in the Community. A great article on how to install Scansnap manager OSX (S1500m) if you bought the $100 cheaper (but identical) Windows version (S1500).
I also tested the Scansnap Organizer software. It’s a HUGE install (1.1 GB) and looking at the functionality I have NO idea why it’s that big. 40 MB max. I would say. Looks like a typical case of bad programming relying on nothing but .NET crap. Don’t bother. There’s even free applications that are better.
Software chosen
After testing a lot of application it looks like I will stick with DevonThink Pro. What does it do compared to what I want?
- Automatic importing (with script)
- Manual moving to folders (categories) but the system will suggest a destination folder based on the contents. This really works great.
- NOT: renaming. You have to do that manually
You only modify the names if you import all the scans into the DevonThink database. When indexing existing files on changes will be made, nor will they be placed in some kind of organized folder structure
SOLUTION: Use the Database and do a regular EXPORT of all folders. This will place the files in a folderstructure just like the categories in DT.
- NOT: auto-tagging based on content is not possible.
Good reading
Paperless office myths
Your Paperless Office (good online book about document management. Very practical)
How to manage your collection of PDF files
Scansnap and Hazel a paperless match in heaven
Overview of paperless office applications
How to make your office paperless?
Multi-page / duplex document separation
The main challenges I ran into are linked to the Scansnap Manager software. In my case, the software is running in a Windows XP Virtual Machine (VMWare Server 2.0). There’s a couple of different scanning scenario’s that are required for a high Wife Acceptance Factor (WAF).
Put in 2 single-page bank statements and 1 four-page letter. Now you have the choice
- 1. Scan the entire batch into 1 PDF (searchable, duplex) that you have to manually split afterwards
Disadvantage: takes more time to split PDF documents afterwards - 2. Scan each instance separately. (that means: 1 bank statement – scan, 1 bank statement – scan, 4-page letter – scan).
Disadvantage: takes a LOT more time
Currently I have many profiles
– duplex 1 page (2 pages: front & back)
– duplex 2 page (4 pages: 2x front & 2x back)
– duplex MULTI page (scan until it ends) <– CHOICE!
– simplex 1 page (2 pages: front & back)
– simplex 2 page (4 pages: 2x front & 2x back)
– simplex MULTI page (scan until it ends)
The problem is that to select a matching profile you need to go into the Scansnap Manager. Guess what, this is running in a virtual machine and I don’t want to access computers when I’m scanning documents.
Theoretically you don’t have to use different profiles for simplex and duplex scans as the software detects blank pages and removes them. But some documents have stuff on the back you don’t care about so you have to remove these manually.
CHOICE: Nr.1 Scan everything Duplex into 1 big PDF. Let the splitting happen afterwards
Scan quality
I took a bank statement and scanned it with different settings. I compared the results.
Settings: Scan-to-PDF (searchable), simplex
– 150 dpi / auto-color-detection / compression 3
295 Kb OCR: great Preference!
– 150 dpi / auto-color-detection / compression 4
217 Kb OCR: some problems (small print)
– 150 dpi / auto-color-detection / compression 4 / text-only
230 Kb OCR: more problems (small print + no colors)
– 150 dpi / auto-color-detection / compression 5
135 Kb OCR: more problems (small print)
Changing auto-color-detection to ‘Color’ created an 297 Kb (instead of 295 Kb) file where the colors were not nearly as virbant as with the auto-color setting. OCR is also terrible.
Changing the auto-color-detection to ‘Color high compression’ created an 83 Kb (instead of 295 Kb) file which had major OCR issues, looked horrible and had the worst colors.
PREFERENCE: Using “Auto color detection”, “Compression: 3” and resolution “Normal: 150 dpi”
NEXT
Try to lower the amount of incoming paper by
- Requesting electronic bills
- Requesting electronic bank statements
- Setting up ‘machtigingen’ for automatic payment
(less chance to forget a payment and getting reminders)
Can you post a follow-up, because I’d like to know:
– Are you still satisfied with your solution (hardware/software)?
– Did you switch to other software/hardware and what are your experiences?
– Did you met your other goals in the mean-time?
– Are you keeping all your hard-copies or are you trowing them out (I’m referring to any legal issues when you supply a printed copy of, for example, a warranty claim)?
I’m very eager to hear your experiences.
Greetings,
Jeffrey
– Still using the same Hardware/SW.
– Like the Fujitsu scanner but it would be easier to have a device that is connected through ethernet/wifi. Those scanners are $$$ so not an option.
– I’m keeping the hardcopies but I don’t have to worry about how and where to store them. Everything arriving in 2012 goes into a box called…. “2012”. No more sorting or organizing.
– It is difficult to consistently scan all incoming documents directly, so I stack them and once per month I scan them and process them.
Thanks for your great article. Makes decisions a lot easier!
> NOT: auto-tagging based on content is not possible
For this there is a little workaround:
You can mark some text on the paper with a textmarker which are set as tags to the PDF from the ScanSnap software.
If you don’t pass the file automaticly but use the “Scan to folder”-function and after that use the “Import” from DevonThink then the tags generated from the marked text are set to the pdf as “keyword” in DT. After that you can use the DT script “Convert keywords to tags” and – voilà – you have the marked text as tag.
Sounds more difficult than it is in real life… :-)
> it would be easier to have a device that is connected through ethernet/wifi
This could be done with the Belkin F5L009 USB-Ethernet-Hub (cheap on eBay)
Some links:
https://www.documentsnap.com/using-a-scansnap-with-a-network-usb-hub-your-experience/
Bye…
One more thing… ;-)
This article also covers text marking, tagging and renaming:
https://www.documentsnap.com/scansnap-and-hazel-is-a-match-made-in-paperless-heaven/
Bye…
DevonThink would be my choice also if on a Mac. I’m very surprised that they don’t expand to PC – it would seem in the best interest of a file organizer to be multiplatform.
Anyway, Benubird did sound interesting but it’s discontinued and I’m not sure about the WAF or how easy it would be to sync across systems. Evernote is great if your the kind of person who likes to walk around naked with your social security # tattooed to your chest, but personally I like my data to have a bit more security than that. I haven’t bought my scanner yet (I’m not sure the time is really right / the right software exists yet), I want a plan with a very high WAF that can be synced across multiple computers. I’m looking at tagspaces (tagspaces.com) as an option. Have you (or anyone here) tried / considered this? It has a lot of growing up to do still, but the project seems alive and still perhaps better than other options I’ve seen so far.
As an aside, I’m curious why you desired a wireless option? This has not been a criteria in my search since it seems a USB3 connection would be faster / less of a headache. The files will all be shared with sync software anyway. Am I missing something about the allure of the wireless option?
Wifi: because I wanted to put the scanner in a closet in the living room, without any computers nearby. Pull out the scanner, feed the paper and magic would happen somewhere in the network.
Currently: I don’t care about the wifi anymore. I scan my papers about once per month and that appears to be enough.
Speed? USB2 or USB3 doesn’t make a difference because at the end, processing the scanned documents, that’s what’s taking the time (OCR process)
Tagspaces: Took a quick look and it seems like this is more about handling files and organising them.
What I like about devonthink is that all scanned documents in the inbox can be placed in the right folder with a single click. Why? Because Devonthink recognises the document/content and suggest “should I put this in the ‘car-insurance-folder’?” and I only have to confirm.