[SOLVED] OCR on PDFs?
by business_kid from LinuxQuestions.org on (#58VK8)
This may have been covered here,for which I apologise.
I am lumbered with many poorly scanned documents, images --> pdf, probably low-res. They're doubly distracting because the documents look filthy, typed, not written. Pics in B/W, produced by guys who are not clued in with pcs. In their area they are actually great, so it's worth effort.
I would install tesseract and try OCR -->editor, but does tesseract operate on pdf pages? I don't think so. Otherwise I'll need to pipe the pics to tesseract. These pdfs are not fancy in any way.
Any suggestions?


I am lumbered with many poorly scanned documents, images --> pdf, probably low-res. They're doubly distracting because the documents look filthy, typed, not written. Pics in B/W, produced by guys who are not clued in with pcs. In their area they are actually great, so it's worth effort.
I would install tesseract and try OCR -->editor, but does tesseract operate on pdf pages? I don't think so. Otherwise I'll need to pipe the pics to tesseract. These pdfs are not fancy in any way.
Any suggestions?