Extracting columns from one file to merge in another file based on a common single column
by Aemm from LinuxQuestions.org on (#56QQ4)
Hi all, I have two files with the following format.
File1
Code:rs_number reference_allele other_allele eaf beta se beta_95L beta_95U z p-value _-log10_p-value q_statistic q_p-value i2 n_studies n_samples effects
rs143225517 C T 0.123274 0.012831 0.025258 -0.036675 0.062337 0.507997 0.611478 0.213619 0.000000 1.000000 -nan 1 144303 +?
rs3094315 A G 0.845492 -0.018285 0.021551 -0.060525 0.023955 -0.848453 0.396186 0.402101 0.000000 1.000000 -nan 1 149758 -?File2
Code:22 16050001 rs1285450127 G C . . dbSNP_153;TSA=SNV;E_Freq;E_TOPMed
22 16050005 rs1442173437 T C . . dbSNP_153;TSA=SNV;E_Freq;E_TOPMed
I want to add first two columns from file 2 to file 1 where ever column 1 of file 1 matches column 3 of file 2 (rs followed by number).
I have been trying some different code but could not be successful.
Code:awk 'NR==FNR {FILE1[$1]=$0; a[$1]=$1; next} \
($1 in FILE1) {print $0, FILE1[$1]}' \
file1.txt file2.txt\
> output_fileCan anyone please help me with this?


File1
Code:rs_number reference_allele other_allele eaf beta se beta_95L beta_95U z p-value _-log10_p-value q_statistic q_p-value i2 n_studies n_samples effects
rs143225517 C T 0.123274 0.012831 0.025258 -0.036675 0.062337 0.507997 0.611478 0.213619 0.000000 1.000000 -nan 1 144303 +?
rs3094315 A G 0.845492 -0.018285 0.021551 -0.060525 0.023955 -0.848453 0.396186 0.402101 0.000000 1.000000 -nan 1 149758 -?File2
Code:22 16050001 rs1285450127 G C . . dbSNP_153;TSA=SNV;E_Freq;E_TOPMed
22 16050005 rs1442173437 T C . . dbSNP_153;TSA=SNV;E_Freq;E_TOPMed
I want to add first two columns from file 2 to file 1 where ever column 1 of file 1 matches column 3 of file 2 (rs followed by number).
I have been trying some different code but could not be successful.
Code:awk 'NR==FNR {FILE1[$1]=$0; a[$1]=$1; next} \
($1 in FILE1) {print $0, FILE1[$1]}' \
file1.txt file2.txt\
> output_fileCan anyone please help me with this?