Works great at OCR recognition of existing PDFs
I’m not using this to scan to PDF, but rather to take existing PDFs which were scanned in a while back and running OCR on them.
Performance is good. It takes advantage of all the CPUs on my box according to Activity Monitor. It also doesn’t use a huge amount of memory - a few GB at peak. So, seems well tuned. Output performance is good too, churning through the pages quite quickly. With a 145 page, 8MB document with embedded graphics throughout it took about 3 minutes (tied with Elucidate for the fastest OCR app I’ve tried today). A 233-page, 20MB document (heavier on graphics) took 4.5 minutes (which was the fastest of all the OCR tools I’ve tried on that document today!) So, no qualms at all from the performance perspective.
Output is spot-on with one caveat (missing table of contents navigation). After saving the document and opening up in Preview I am able to search all of the text that had been in images, and the highlight locations match exactly the locations in the formerly straight-image PDF pages. Even the pages that scanned in at about a 15-degree skew were properly OCR’d and the highlights appear in the correct spots (the text in the graphic angles up, and the highlight for each line goes directly across, so on the right edge there isn’t a great match, but it is completely workable when searching for text).
Missing:
* After saving as a PDF again, the table of contents from the original PDF is missing. That is, if you show the “Table of Contents” in the Preview sidebar, where before there might have been chapters and subchapters now there is just the name of the file indicating no saved table of contents. Not sure why, but this seems endemic with the PDF processing tools I’ve used today.
* Progress indicator while running OCR on a long document would be nice
* Single-click “OCR this whole document” button (at first I used the “Recognize Text (OCR)” menu item and was astounded how fast it was; that was because it only ran the first page through OCR) [Edit: I found an “Automatically start OCR when opening existing files” option in preferences, which completely negates this quibble]
* No batch mode
Definitely wish this had been the first PDF OCR tool I had purchased today!
Tom Dibble about
PDFScanner - Scanning and OCR, v1.10.3