
It was surprisngly easy to set up the Paperless Project prototype workflow we dreamed up last post. ScanSnap scans go to a folder watched by Acrobat. Acrobat opens the files, OCRs them and outputs to a folder watched by kip. kip then loads the documents into its library, and uses the OCR text to create keywords. The screenshot above shows Acrobat and kip chugging away on all the scans to date.
It takes a little editing to make sure every scan has a useful tag like ‘receipt’, but it’s easy to batch tag files. I also note that as I tag more files the program is becoming more successful at tagging similar incoming files. This is important because I want to spend as little time as possible scanning and editing. The kip browser makes it much easier to find a file or see what’s in a PDF than looking on the filesystem with the Finder browser. Acrobat’s OCR does an amazing job of capturing text on things like wrinkled cash register receipts.
So I think we’ll declare this phase of the project a success. Next phase will involve finding a better way to back up the scans and kip info on the half-terabyte array (we’re using Backup at the moment - its incremental saves are big). I’m getting better at recycling the paper after it’s been scanned…
{ 4 trackbacks }
{ 10 comments… read them below or add one }
Some background for the Paperless Project:
The half-terabyte array: http://www.gulker.com/2006/04/30.html#a2730
kip mini-review: http://www.gulker.com/wp/2006/07/27/kip-it-simple/
I REALLY want to do this.
I bought PaperPort 9.0 for Windoze to do this (which at first glance it seems to do very well) - but I am a Mac user at heart.
What version of Acrobat is needed to do this?
From http://www.adobe.com/products/acrobat/pdfs/acro7_matrix.pdf it looks like Acrobat Professional - Scan documents into Adobe PDF and automatically recognize text with optical character recognition (OCR) “. That is $450. Ouch.
Re: Acrobat
Unfortunately, Acrobat Pro is the minimum product level to get OCR. I might note that Pro does an awful lot more than just OCR - the batch processing tools can be applied to many of the dozens of tools that Acrobat Pro provides. However, appreciate that Pro is overkill for many home users.
I’m not really up on lower priced Mac OCR products. Might try Version Tracker or a Google search…
Experience with Paperless so far is good, btw… the PDFs, or printouts of same, have worked for most situtaions… and I’m much more likely to find things on my computer than in shoe boxes, drawers or the various stacks that used to grow around here.
I’m interested in doing the above.
I’m looking into OCR apps for the Mac and another, cheaper option(to Acrobat) would be ReadIris (http://www.irislink.com/)..
Although I’m have a helluva time trying to get a trial version to download.
Mac OCR options seem to be a bit limited, outside Acrobat and Iris. I can vouch for Adobe Acrobat 7 (please note: I work for Adobe), which is probably what you want for a professional environment, like a law practice.
Acrobat OCR works amazingly well on my G4 mini… it picks up my name from those 2-inch wide dot-matrix restaurant receipts I sometimes scan in. It also picked up my lawyer’s name, and my tax guy’s name from faxes, so I can find correspondance from both relatively easily using kip or Spotlight on Mac.
Easy for me to say, but the cost of Acrobat Pro, amortized over hundreds or thousands of scanned docs, is probably reasonable vs. the cost of dealing with paper. Alternately, use Windows or Linux products…
You or your readers may be interested in a new eBook I have just published called Paperless Office, from Myth to Reality. It is a great guide for people searching for advice on converting to electronic document management.
You get time-saving suggestions and a quick, easy read regarding electronic document management concepts. This ebook is especially suited for personal record keeping, small paper-intensive offices such as attorneys, accountants, health care professionals, and others who now keep loads of manila folders.
You also get advice on how to select the right computer hardware that prevents you from spending too much and getting the wrong equipment. And you will learn about scanner-bundled FREE software that meets most electronic record keeping needs.
In addition, you get suggestions on setting up an effective electronic record classification system that virtually eliminates the need to painstakingly name image files, helping you avoid lost files.
The author’s is an expert in both paper-based and electronic record keeping systems and has an established base of over 2400 clients, primarily physicians, dentists, attorneys, accountants and city/county governement.
After reading this ebook, you will be able to quickly set up your electronic document records while keeping your investment to a minimum.
So if you or your readers are considering going paperless and want to get up to speed in a hurry, consider reading Paperless Office, from Myth to Reality.
A sample of the eBook is available FREE via a download at http://www.documentmanagementbook.com . And a 90 day money back guarantee is provided.
Peter Harnack, Author
P.S. I forgot to mention that I devote one chapter of the book to setting up an effective filing system using PaperPort (mentioned above). You can get PaperPort FREE as part of the software bundled with certain personal scanners that I also cover in the book.
Pete Harnack
Could you please let me know how you implement that one folder is watched by Adobe Acrobat and also by Kip.
I presume that you make using folder actions. I have made the batch from Acrobat professional…can you be more especific on what script are you using in order to launch the Acrobat batch when a file enters a folder ?
Thanks,
Marcos
Marcos-
I have been using Acrobat batch processing - I have to manually launch from Acrobat’s Advanced>Batch Processing… menu. I set the output files to go to kip’s watched folder.
I have been wondering if I could automate that with a script or Automator… if you figure it out first, please let me know!
Acrobat Pro does have the batch feature, but Adobe Standard will do OCR.
Why do I mention this?
Because ScanSnap *comes with* Adobe Standard.
I recently found ScanSnap on sale for something like $320 after a $50 rebate (incl shipping). Given that Adobe Standard is usually like $200-300 it was a huge deal.
I scanned a bunch of documents at once and go through every few days and run OCR on some important ones. It’s not as seamless as it would be with Pro, but it’s a lot cheaper.