Batch download linked assets from PDF files in website
by ericlindellnyc from LinuxQuestions.org on (#5R1T4)
I have downloaded a website that has many PDFs on it for download. I used
Code:wget -r -l 2 -N -t1 --mirror --convert-links --adjust-extension --page-requisites --no-parent --no-check-certificate http://mileswmathis.com/to get the PDFs.
I thought by adding -l 2 it would also retrieve the media assets linked to from the PDFs, but it did not. It will do this from an HTML file, but not a PDF.
There are great PDFs on this site, but I would like the linked assets also, such as when it links to an image or video file -- or anything else, for that matter.
I don't know how to batch sift through the PDFs to find the links and download linked content.
Ideas?
Thank you.
Code:wget -r -l 2 -N -t1 --mirror --convert-links --adjust-extension --page-requisites --no-parent --no-check-certificate http://mileswmathis.com/to get the PDFs.
I thought by adding -l 2 it would also retrieve the media assets linked to from the PDFs, but it did not. It will do this from an HTML file, but not a PDF.
There are great PDFs on this site, but I would like the linked assets also, such as when it links to an image or video file -- or anything else, for that matter.
I don't know how to batch sift through the PDFs to find the links and download linked content.
Ideas?
Thank you.