Article 5Q2C3 Extract Main Article from HTML files in Directory Structure

Extract Main Article from HTML files in Directory Structure

by
ericlindellnyc
from LinuxQuestions.org on (#5Q2C3)
I've successfully used this to extract main article from one local HTML file:
Code:cat JamesWatt.html | trafilatura >> JamesWatt.txtTo do this recursively for a whole nested directory, I tried
Code:find . -name *.html' -exec cat {}' | trafilatura >> {}.txt' \;producing these errors
Code:find: -exec: no terminating ";" or "+"Code:trafilatura: error: unrecognized arguments: ;html2text removes tags, but leaves in menus and other non-article verbiage.

I've posted to two forums and for some reason haven't received any reply.
Assistance would be greatly appreciated !!latest?d=yIl2AUoC8zA latest?i=8abaoUGpgUc:g8DAWuwHavE:F7zBnMy latest?i=8abaoUGpgUc:g8DAWuwHavE:V_sGLiP latest?d=qj6IDK7rITs latest?i=8abaoUGpgUc:g8DAWuwHavE:gIN9vFw8abaoUGpgUc
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments