After four days we’re starting to get our arms around all of the issues surrounding the move to WordPress. The neat ‘Tiny MCE’ WYSIWIG editor that runs in WordPress’ post window (in FirefFox anyway) has a few bugs (most notably a penchant for swapping paragraph and line break tags) and using the HTML editor has a few challenges – WordPress tries really hard to clean up the HTML, not always a good thing.
The bigger mystery surrounds WordPress’ re-directs. We’ve done an Apache mod_rewrite rule in our .htaccess file to funnel incoming home page requests to index.php, and have installed Feed Director to further redirect incoming RSS and other feed requests.
The index redirect is working, but the feed redirects are behaving oddly, as were 404-not found pages. We have fixed the 404s, but the feeds issue persists.
The to-do list:
- Fix feed re-directs
Fix 404s
- Fix/workaround the post editor
Don’t seem to be a lot of docs on the web re: Feed Director debugging…

It was surprisngly easy to set up the Paperless Project prototype workflow we dreamed up last post. ScanSnap scans go to a folder watched by Acrobat. Acrobat opens the files, OCRs them and outputs to a folder watched by kip. kip then loads the documents into its library, and uses the OCR text to create keywords. The screenshot above shows Acrobat and kip chugging away on all the scans to date.
It takes a little editing to make sure every scan has a useful tag like ‘receipt’, but it’s easy to batch tag files. I also note that as I tag more files the program is becoming more successful at tagging similar incoming files. This is important because I want to spend as little time as possible scanning and editing. The kip browser makes it much easier to find a file or see what’s in a PDF than looking on the filesystem with the Finder browser. Acrobat’s OCR does an amazing job of capturing text on things like wrinkled cash register receipts.
So I think we’ll declare this phase of the project a success. Next phase will involve finding a better way to back up the scans and kip info on the half-terabyte array (we’re using Backup at the moment – its incremental saves are big). I’m getting better at recycling the paper after it’s been scanned…

This is kip, a PDF browser for Mac OS X described as ‘iPhoto for PDF documents.’ I stumbled across it yesterday while scanning for news about my* favorite PDF application Adobe Acrobat. kip appears to use Mac OS X system services like PDF rendering and components of Spotlight search to make a fast, lightweight and very useful document browser.
kip attaches keywords called tags to your imported PDFs, which it generates algorithmically from the text. You can also apply tags of your own and otherwise edit them. The panel at the left shows the kip Cloud, a Flickr-like display where larger text means a more frequently-used tag.
Anyway, we have been proceeding with our attempt at a fully-paperless life – most of the bills come in electronically nowadays, and I scan everything else to PDF using a nifty motorized Fujitsu ScanSnap. Adobe Acrobat 7 OCRs the scanned files, making the text searchable, and then they go into a directory of folders on the half-terabyte array, one of our other summer projects.
So far, a file system directory has been the only navigation for the scans and other PDFs. I haven’t yet figured out the best way to do full-text indexing and search on the array’s CIFS volumes. kip provides an interesting alternative: index everything in kip using keywords, and use kip as the front end for the paperless project. Think we’ll do a quick prototype workflow and see how this works out…