Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Counting Pattern Per Column From Line X to Y Using AWK

I have a file that looks like this:

Sample_ID Population CP026243.1_309     CP026243.1_318  CP026243.1_427  CP026243.1_449  CP026243.1_515
Turbot-BalticSea_01     BalticSea       0       0       0
Turbot-BalticSea_02     BalticSea       -1      -1      -1
Turbot-BalticSea_03     BalticSea       -1      -1      0
Turbot-BalticSea_04     BalticSea       0       0       -1
Turbot-BalticSea_05     BalticSea       -1      -1      0
Turbot-BalticSea_06     BalticSea       -1      0       0
Turbot-BalticSea_07     BalticSea       -1      -1      -1
Turbot-BalticSea_08     BalticSea       0       0       0
Turbot-BalticSea_09     BalticSea       0       0       1
Turbot-BalticSea_10     BalticSea       0       0       0
Turbot-BalticSea_11     BalticSea       -1      -1      1
Turbot-BalticSea_12     BalticSea       0       0       0
Turbot-BalticSea_13     BalticSea       0       0       0
Turbot-BalticSea_14     BalticSea       0       1       -1
Turbot-BalticSea_15     BalticSea       0       0       0
Turbot-BalticSea_16     BalticSea       0       0       0
Turbot-BalticSea_17     BalticSea       0       0       -1
Turbot-Kattegat_01      Kattegat        0       -1      0
Turbot-Kattegat_02      Kattegat        -1      -1      0
Turbot-Kattegat_03      Kattegat        -1      -1      -1
Turbot-Kattegat_04      Kattegat        0       0       0
Turbot-Kattegat_05      Kattegat        -1      -1      -1
Turbot-Kattegat_06      Kattegat        -1      -1      -1

I would like to know how I could count the number of "-1" in each column (starting from the 3rd column and from the second row) per population (BalticSea / Kattegat).

I guess AWK would be good for doing this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

cat TEMP.tsv | awk 'NR>1{...}'

Thanks a lot in advance, George.

>Solution :

With awk you could do something like this:

tail -n +2 TEMP.tsv | awk '$3=="-1" { count[$2]++ } END {for (word in count) print word, count[word]}'

which counts the occurrences of -1 in the 3rd column grouped by 2nd column value.

Note that tail command removes the header.

$3 in the awk command is checking for third column. Use $4, $5 for the fourth, fifth column and so on..

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading