Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

extract specific row with numbers over N

I have a dataframe like this

1  3  MAPQ=0;CT=3to5;SRMAPQ=60
2  34  MAPQ=60;CT=3to5;SRMAPQ=67
4  56  MAPQ=67;CT=3to5;SRMAPQ=50
5  7  MAPQ=44;CT=3to5;SRMAPQ=61

with using awk (or others)

I want to extract rows with only SRMAPQ over 60.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This means the output is

2  34  MAPQ=60;CT=3to5;SRMAPQ=67
5  7  MAPQ=44;CT=3to5;SRMAPQ=61

update: "SRMAPQ=60" can be anywhere in the line,
MAPQ=44;CT=3to5;SRMAPQ=61;DT=3to5

>Solution :

You don’t have to extract the value out of SRMAPQ separately and do the comparison. If the format is fixed like above, just use = as the field separator and access the last field using $NF

awk -F= '$NF > 60' file

Or if SRMAPQ can occur anywhere in the line (as updated in the comments), use a generic approach

awk 'match($0, /SRMAPQ=([0-9]+)/){ l = length("SRMAPQ="); v = substr($0, RSTART+l, RLENGTH-l) } v > 60' file
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading