Article 6H3Y7 Fine-grained file differences

Fine-grained file differences

by
John
from John D. Cook on (#6H3Y7)

The diff utility compares files by lines, which is often what you'd like it to do. But sometimes you'd like more granularity.

For example, supposed we want to compare two versions of Psalm 23. Here are the first three verses in the King James version:

The Lord is my shepherd; I shall not want.
He maketh me to lie down in green pastures:
he leadeth me beside the still waters.
He restoreth my soul:
he leadeth me in the paths of righteousness
for his name's sake.

And here are the corresponding lines from a more contemporary translation, the English Standard Version:

The Lord is my shepherd; I shall not want.
He makes me lie down in green pastures.
He leads me beside still waters.
He restores my soul.
He leads me in paths of righteousness
for his name's sake.

Save these in two files, ps23.kjv and ps23.esv. If we run

diff ps23.kjv ps23.esv

we get

diffshot1.png

This says that the two files differ in lines 2 through 5; the first and last lines are identical. The output shows lines 2 through 5 from each file but doesn't show how they differ.

To see more fine-grained differences, such as changing maketh to makes, we can run the version of diff that comes with git.

If we run

git diff --word-diff ps23.kjv ps23.esv

we can compare the files by words rather than by lines. This produces

diffshot2.png

The colors help make the test more readable, assuming you can see the difference between red and green. I assume the color scheme is configurable. But the text is readable without the color highlighting. For example, in the first line we have

[-maketh-]{+makes+}

which means we remove the word maketh and add the word makes.

We can compare the files on an even finer level, comparing by characters rather than words. For example, rather than saying we need to change maketh to makes the software can say we need to change the th ending to s. We can do this by running

git diff --word-diff-regex=. ps23.kjv ps23.esv

The option --word-diff-regex=. says to use the regular expression . to indicate word boundaries. Since the dot matches any character, this says to chop the lines into individual characters.

diffshot3.png

As before we have square brackets to indicate what to remove and curly braces to indicate what to add, but now we're removing and adding letters rather than words.

We can get a more compact display of the differences if we rely on color alone, by adding the --word-diff=color option.

git diff --word-diff=color --word-diff-regex=. ps23.kjv ps23.esv

produces the following.

diffshot4.png

Equivalently, we can combine the two options

--word-diff=color --word-diff-regex=.

into the one option

--color-words=.

that specifies the word separation regular expression as an option to --color-words.

This may be the most convenient way to see the differences, provided you can distinguish the colors, and don't need to use the plain text programmatically. Without the colors, makeths, for example, becomes simply makeths and we can no longer be sure what changed.

The post Fine-grained file differences first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments