Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding column after comparing two files

I have a file with >19 000 rows with the following structure:

$ head -10 a_vt
9999.77,-83.03,-7.71771771772,276.97,-7.71771771772
9999.48,-83.57,-7.23723723724,276.43,-7.23723723724
9999.08,-83.99,-7.2972972973,276.01,-7.2972972973
9998.75,-81.71,-6.996996997,278.29,-6.996996997
9998.75,-81.65,-6.996996997,278.35,-6.996996997
9998.69,-80.87,-8.7987987988,279.13,-8.7987987988
9998.34,-81.05,-8.43843843844,278.95,-8.43843843844
9997.89,-83.99,-6.21621621622,276.01,-6.21621621622
9997.77,-77.27,-16.1261261261,282.73,-16.1261261261
9997.54,-82.43,-4.29429429429,277.57,-4.29429429429
...
...

and using this type of file (ie. files commonly with a variable number of rows):

$ cat b_vm
22850,39.78686TN,39.78686TN,-75.6259,-14.9867,284.374,-14.9867
22901.9,9.90099TN,9.90099TN,-75.649,-14.9636,284.351,-14.9636
27742.2,160.0TN,160.0TN,-75.5999,-14.9922,284.4,-14.9922
22901.9,110.0TN,110.0TN,-75.6648,-14.9526,284.335,-14.9526
9998.69,110.0TN,110.0TN,-75.6551,-14.9496,284.345,-14.9496
9998.34,100.0TN,100.0TN,-75.62949999999998,-14.9573,284.37,-14.9573
27742.2,90.0TN,90.0TN,-75.60129999999998,-14.9973,284.399,-14.9973
27685.3,90.0TN,90.0TN,-75.6024,-14.9626,284.398,-14.9626
27742.2,80.0TN,80.0TN,-75.6014,-15.0006,284.399,-15.0006
22901.9,80.0TN,80.0TN,-75.6597,-14.9626,284.34,-14.9626

the 19k-rows file is filtered out after matching values of the first column in order to get:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

$ awk 'NR==FNR { a[$1]; next }( ($1 in a) ) { print }' FS="," b_vm a_vt 
9998.69,-80.87,-8.7987987988,279.13,-8.7987987988
9998.34,-81.05,-8.43843843844,278.95,-8.43843843844

Not an awk expert here, but I understand that this one-liner awk call allows to get values from cache after comparing values of the first column in both files. The issue is to print the second column of b_vmt as well as follows:

9998.69,-80.87,-8.7987987988,279.13,-8.7987987988,**110.0TN**
9998.34,-81.05,-8.43843843844,278.95,-8.43843843844,**100.0TN**

Any hints are very welcomed,

>Solution :

You may use this awk:

awk -F, 'NR==FNR { a[$1] = $2; next } $1 in a {print $0 "," a[$1]}' b_vm a_vt

9998.69,-80.87,-8.7987987988,279.13,-8.7987987988,110.0TN
9998.34,-81.05,-8.43843843844,278.95,-8.43843843844,100.0TN

Here a[$1] = $2 stores $2 in array a by the index $1. And in the 2nd pass we print a[$1] to print stored value after full record from a_vt file.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading