pdfgrep vertical text

fdq09eca

from LinuxQuestions.org on 2020-06-16 09:41 (#54NDV)

I have a bunch of pdf that may not be written in format (randomly downloaded from google scholar). I am trying to check through their data source, so I used pdfgrep for the task.
Code:pdfgrep --color always -ni "data|sources|Table" pdfs/*.pdfIt was successful until I found out that are some table were landscaped but the page is vertically placed. I attempt to rotate them before grep-ing them by
Code:pdftk pdfs/25770596.pdf cat 8east output pdfs/25770596_r.pdfbut it makes no difference.

then I tried to turn the rotated file into text

Code:pdftotext -f 1 -l 8 pdfs/25770596_r.pdf pdfs/25770596r_txt.txtwhich actually served my purpose. It rendered some caption line in the landscaped page. However, the table is messed up and the numbers are in chaos.. and there are some numbers missing.

I would like to know if there is any more elegant way to complete the task?

The .pdf is here.

Thank you.

latest?i=r7hGqXjHbls:bUBMEY3jdcU:F7zBnMy

latest?i=r7hGqXjHbls:bUBMEY3jdcU:V_sGLiP

latest?i=r7hGqXjHbls:bUBMEY3jdcU:gIN9vFw

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/