Article 51AA0 speed up grepping a file for a long list of needles

speed up grepping a file for a long list of needles

by
masavini
from LinuxQuestions.org on (#51AA0)
hi,
i have a long list of needles. i need to know which ones are present inside a file.

i.e.:
Code:$ wc -l needles.txt
3589 needles.txt

$ head -3 needles.txt
this_string_is_present
this_is_not
and_so_on

$ wc -l hay.txt
756 hay.txt

$ head -3 hay.txt
this file contains a lot of strings: this_string_is_present
some needles are present
and some are nota simple (and SLOW) solution could be:
Code:hay=$(< hay.txt) # store hay.txt in a variable to avoid reading the disk thousands of times

while read needle; do
grep -q "${needle}" <<< "${hay}" \
&& needles+=( "${needle} - verified" ) \
|| needles+=( "${needle}" )
done < needles.txtanother (still pretty slow) solution could be using grep -Fo and comm:
Code:sort needles.txt > sorted-needles.txt
grep -Fo needles.txt hay.txt | sort > verified-needles.txt

comm -13 sorted-needles.txt verified-needles.txt > unverified-needles.txt
can you suggest a better solution?
thanks!latest?d=yIl2AUoC8zA latest?i=HzHb2_DVTfQ:KGJ3Lb4hBuU:F7zBnMy latest?i=HzHb2_DVTfQ:KGJ3Lb4hBuU:V_sGLiP latest?d=qj6IDK7rITs latest?i=HzHb2_DVTfQ:KGJ3Lb4hBuU:gIN9vFwHzHb2_DVTfQ
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments