how diff are two files
by Skaperen from LinuxQuestions.org on (#5P60F)
i have two files to compare. the goal is to get a good idea if the files are similar (one is a close variant of the other) or radically different (just not the same purpose, not similar, like having picked two random files).
is the size of output from the diff command a reasonable measure of this, relative to the sizes of the files? i would also count the number of lines and see how that is different.
the use case is collecting all the base names of files in a tree along with a reference to the full path and compare every instance of each base name (same base name hinting at possible same purpose) and reporting names that have similar, but not identical, files. this would report possible leftovers that did not get replaced when the files should be updated. you edit a file that needs a copy in every directory and cannot use a symlink. then you force copy the change to each place. over a year and hundreds of such files, a few might be missed. this is to find them. but i do not want it to report files that are just not the same thing.
is the size of output from the diff command a reasonable measure of this, relative to the sizes of the files? i would also count the number of lines and see how that is different.
the use case is collecting all the base names of files in a tree along with a reference to the full path and compare every instance of each base name (same base name hinting at possible same purpose) and reporting names that have similar, but not identical, files. this would report possible leftovers that did not get replaced when the files should be updated. you edit a file that needs a copy in every directory and cannot use a symlink. then you force copy the change to each place. over a year and hundreds of such files, a few might be missed. this is to find them. but i do not want it to report files that are just not the same thing.