Comparing a reference word to file contents
by danielbmartin from LinuxQuestions.org on (#5DNGW)
Have: a reference word. This is not (necessarily) an English word.
It is merely a string of characters.
Have: a file of words, one word per line. Again, might be or not be English words.
Want: the input file with an indication of which letters in each word are NOT in the reference word.
Example: reference word = etaoinshrdlu
Example: the file contains this ...
Code:roosevelt
truman
eisenhower
kennedy
johnson
nixon
fordAll of these three solutions..
Code:RW='etaoinshrldu' # RW = Reference Word
# Method 1.
tr -d "[$RW]" <$InFile \
|paste -d' ' $InFile - \
> $OutFile
# Method 2.
sed 's/['$RW']//g' <$InFile \
|paste -d' ' $InFile - \
> $OutFile
# Method #3.
awk '{a=gensub(/['$RW']/,"","g",$0);
print $0,a}' $InFile >$OutFile... produce the desired OutFile ...
Code:roosevelt v
truman m
eisenhower w
kennedy ky
johnson j
nixon x
ford f
Now consider the mirror-image problem....
Have: a reference word. This is not (necessarily) an English word.
It is merely a string of characters.
Have: a file of words, one word per line. Again, might be or not be English words.
Want: the input file with an indication of which letters in the reference word are NOT in each input word.
This solution ...
Code:# Method 6.
awk -v rw=$RW 'BEGIN{n=split(rw,a,"")}
{NoMatch=""
for (j=1;j<=n;j++)
if (!match($0,a[j])) NoMatch=NoMatch a[j]
print $0,NoMatch}' \
$InFile >$OutFile... produces the desired OutFile ...
Code:roosevelt ainhdu
truman eoishld
eisenhower taldu
kennedy taoishrlu
johnson etairldu
nixon etashrldu
ford etainshluAs a matter of personal coding style I strive for concise solutions without explicit loops. Gurus, please offer constructive criticism and better solutions.
Daniel B. Martin
.


It is merely a string of characters.
Have: a file of words, one word per line. Again, might be or not be English words.
Want: the input file with an indication of which letters in each word are NOT in the reference word.
Example: reference word = etaoinshrdlu
Example: the file contains this ...
Code:roosevelt
truman
eisenhower
kennedy
johnson
nixon
fordAll of these three solutions..
Code:RW='etaoinshrldu' # RW = Reference Word
# Method 1.
tr -d "[$RW]" <$InFile \
|paste -d' ' $InFile - \
> $OutFile
# Method 2.
sed 's/['$RW']//g' <$InFile \
|paste -d' ' $InFile - \
> $OutFile
# Method #3.
awk '{a=gensub(/['$RW']/,"","g",$0);
print $0,a}' $InFile >$OutFile... produce the desired OutFile ...
Code:roosevelt v
truman m
eisenhower w
kennedy ky
johnson j
nixon x
ford f
Now consider the mirror-image problem....
Have: a reference word. This is not (necessarily) an English word.
It is merely a string of characters.
Have: a file of words, one word per line. Again, might be or not be English words.
Want: the input file with an indication of which letters in the reference word are NOT in each input word.
This solution ...
Code:# Method 6.
awk -v rw=$RW 'BEGIN{n=split(rw,a,"")}
{NoMatch=""
for (j=1;j<=n;j++)
if (!match($0,a[j])) NoMatch=NoMatch a[j]
print $0,NoMatch}' \
$InFile >$OutFile... produces the desired OutFile ...
Code:roosevelt ainhdu
truman eoishld
eisenhower taldu
kennedy taoishrlu
johnson etairldu
nixon etashrldu
ford etainshluAs a matter of personal coding style I strive for concise solutions without explicit loops. Gurus, please offer constructive criticism and better solutions.
Daniel B. Martin
.