Article 5H8YX Compare and Detect Similar Text Files in a Directory

Compare and Detect Similar Text Files in a Directory

by
salmanahmed
from LinuxQuestions.org on (#5H8YX)
Hi
I have around 100 text files in a directory. Some of these files are similar to each other (containing similar text and sentences). But they are not exactly similar:
1. Some are 20-25% similar with one or more files
2. Some are 40-50% similar with one or more files
and so on...
I want to know that which of these files contain similar text to which other files. As it is very difficult and time consuming to check all the files manually for similarity, so my question here is that: Is there any tool in Linux which can find this similarity between files? Or do I have to create a script (or function) to achieve my objective? How can I do that?
PS: I know about 'diff' however until now I have been using it to compare two files. I dont' know whether we can use 'diff' in situation like mine.latest?d=yIl2AUoC8zA latest?i=hItyCgIc9tc:F3IKEfutZbo:F7zBnMy latest?i=hItyCgIc9tc:F3IKEfutZbo:V_sGLiP latest?d=qj6IDK7rITs latest?i=hItyCgIc9tc:F3IKEfutZbo:gIN9vFwhItyCgIc9tc
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments