Article 74E69 Set intersection and difference at the command line

Set intersection and difference at the command line

by
John
from John D. Cook on (#74E69)

A few years ago I wrote about comm, a utility that lets you do set theory at the command line. It's a really useful little program, but it has two drawbacks: the syntax is hard to remember, and the input files must be sorted.

If A and B are two sorted lists,

 comm A B

prints A - B, B - A, and A B. You usually don't want all three, and so comm lets you filter the output. It's a little quirky in that you specify what youdon't want instead of what you do. And you have to remember that 1, 2, and 3 correspond to A - B, B - A, and A B respectively.

comm_venn.png

A couple little scripts can hide the quirks. I have a script intersect

 comm -12 <(sort "$1") <(sort "$2")

and another script setminus

 comm -23 <(sort "$1") <(sort "$2")

that sort the input files on the fly and eliminate the need to remember comms filtering syntax.

The setminus script computes A - B. To find B - A call the script with the arguments reversed.

The post Set intersection and difference at the command line first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments