Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do you grep/awk from a column in a file?

I have a file of IDs called IDs_list.txt that I want to use in order to extract information from a second file which has hundreds of IDs, many of which are not in my specific IDS_list.txt.

I’ve tried combinations of if and grep but my results keep coming up empty.

Here is an example of what I’m trying to do and what I’ve done.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

cat IDS_list.txt | head -n 4
24
43
56
69

cat sample1.txt | head -n 4
NODE_1_length_148512_cov_24.5066,gi|573017271|gb|CP006568.1|,148512,4513140,8,7289,86.545,0.0,13461,24,madeup species 1
NODE_2_length_122550_cov_25.719,gi|84778498|dbj|AP008232.1|,122550,4171146,13,12690,93.693,0.0,23435,244,madeup species 2
NODE_3_length_103385_cov_25.9802,gi|84778498|dbj|AP008232.1|,103385,4171146,6,4243,88.782,0.0,7836,43,madeup species 3
NODE_4_length_101672_cov_25.6536,gi|84778498|dbj|AP008232.1|,101672,4171146,7,4139,86.799,0.0,7644,955,long name here

The IDs are in the 10th column.

I will need to pull out all lines where the IDs are in the IDS_list.txt.

So my output should be:

NODE_1_length_148512_cov_24.5066,gi|573017271|gb|CP006568.1|,148512,4513140,8,7289,86.545,0.0,13461,24,madeup species 1
NODE_3_length_103385_cov_25.9802,gi|84778498|dbj|AP008232.1|,103385,4171146,6,4243,88.782,0.0,7836,43,madeup species 3

I’ve tried:

for file in sample?.txt; do awk 'FNR==NR{arr[$0];next} ($10 in arr)' IDs_list.txt $file; done

Nothing comes out. This example I took from another stack overflow question.

for i in $(cat IDs_list.txt); do awk -F"," '$10 == $i' sample1.txt; done

But this will print a single output so many times because I am iterating over the IDs_list.txt line by line, so it is not what I want. I will get the first output line maybe hundreds of times because my IDs_list.txt has hundreds of IDs.

Then I tried grep with awk but that didn’t work either. My syntax is off.

for file in sample?.txt; do for i in $(cat IDs_list.txt); do grep -w '$i' $file; done; done

Nothing is output here. My logic is that for each sample file, I want to grep the lines that contain the ID that is found in the IDs_list.txt. However I don’t like not calling the specific 10th column because the IDs sometimes can show up in other columns that are not actually IDs.

Any eloquent way of doing this in a for loop with grep or awk or both somehow?

>Solution :

You may use this awk:

awk -F, 'NR==FNR {ids[$1]; next} $10 in ids' IDs_list.txt sample.txt

NODE_1_length_148512_cov_24.5066,gi|573017271|gb|CP006568.1|,148512,4513140,8,7289,86.545,0.0,13461,24,madeup species 1
NODE_3_length_103385_cov_25.9802,gi|84778498|dbj|AP008232.1|,103385,4171146,6,4243,88.782,0.0,7836,43,madeup species 3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading