Fine-grained file differences
The diff utility compares files by lines, which is often what you'd like it to do. But sometimes you'd like more granularity.
For example, supposed we want to compare two versions of Psalm 23. Here are the first three verses in the King James version:
The Lord is my shepherd; I shall not want.
He maketh me to lie down in green pastures:
he leadeth me beside the still waters.
He restoreth my soul:
he leadeth me in the paths of righteousness
for his name's sake.
And here are the corresponding lines from a more contemporary translation, the English Standard Version:
The Lord is my shepherd; I shall not want.
He makes me lie down in green pastures.
He leads me beside still waters.
He restores my soul.
He leads me in paths of righteousness
for his name's sake.
Save these in two files, ps23.kjv and ps23.esv. If we run
diff ps23.kjv ps23.esv
we get
This says that the two files differ in lines 2 through 5; the first and last lines are identical. The output shows lines 2 through 5 from each file but doesn't show how they differ.
To see more fine-grained differences, such as changing maketh to makes, we can run the version of diff that comes with git.
If we run
git diff --word-diff ps23.kjv ps23.esv
we can compare the files by words rather than by lines. This produces
The colors help make the test more readable, assuming you can see the difference between red and green. I assume the color scheme is configurable. But the text is readable without the color highlighting. For example, in the first line we have
[-maketh-]{+makes+}
which means we remove the word maketh and add the word makes.
We can compare the files on an even finer level, comparing by characters rather than words. For example, rather than saying we need to change maketh to makes the software can say we need to change the th ending to s. We can do this by running
git diff --word-diff-regex=. ps23.kjv ps23.esv
The option --word-diff-regex=. says to use the regular expression . to indicate word boundaries. Since the dot matches any character, this says to chop the lines into individual characters.
As before we have square brackets to indicate what to remove and curly braces to indicate what to add, but now we're removing and adding letters rather than words.
We can get a more compact display of the differences if we rely on color alone, by adding the --word-diff=color option.
git diff --word-diff=color --word-diff-regex=. ps23.kjv ps23.esv
produces the following.
Equivalently, we can combine the two options
--word-diff=color --word-diff-regex=.
into the one option
--color-words=.
that specifies the word separation regular expression as an option to --color-words.
This may be the most convenient way to see the differences, provided you can distinguish the colors, and don't need to use the plain text programmatically. Without the colors, makeths, for example, becomes simply makeths and we can no longer be sure what changed.
The post Fine-grained file differences first appeared on John D. Cook.