Counting Pattern Per Column From Line X to Y Using AWK

I have a file that looks like this:

Sample_ID Population CP026243.1_309     CP026243.1_318  CP026243.1_427  CP026243.1_449  CP026243.1_515
Turbot-BalticSea_01     BalticSea       0       0       0
Turbot-BalticSea_02     BalticSea       -1      -1      -1
Turbot-BalticSea_03     BalticSea       -1      -1      0
Turbot-BalticSea_04     BalticSea       0       0       -1
Turbot-BalticSea_05     BalticSea       -1      -1      0
Turbot-BalticSea_06     BalticSea       -1      0       0
Turbot-BalticSea_07     BalticSea       -1      -1      -1
Turbot-BalticSea_08     BalticSea       0       0       0
Turbot-BalticSea_09     BalticSea       0       0       1
Turbot-BalticSea_10     BalticSea       0       0       0
Turbot-BalticSea_11     BalticSea       -1      -1      1
Turbot-BalticSea_12     BalticSea       0       0       0
Turbot-BalticSea_13     BalticSea       0       0       0
Turbot-BalticSea_14     BalticSea       0       1       -1
Turbot-BalticSea_15     BalticSea       0       0       0
Turbot-BalticSea_16     BalticSea       0       0       0
Turbot-BalticSea_17     BalticSea       0       0       -1
Turbot-Kattegat_01      Kattegat        0       -1      0
Turbot-Kattegat_02      Kattegat        -1      -1      0
Turbot-Kattegat_03      Kattegat        -1      -1      -1
Turbot-Kattegat_04      Kattegat        0       0       0
Turbot-Kattegat_05      Kattegat        -1      -1      -1
Turbot-Kattegat_06      Kattegat        -1      -1      -1

I would like to know how I could count the number of "-1" in each column (starting from the 3rd column and from the second row) per population (BalticSea / Kattegat).

I guess AWK would be good for doing this?

cat TEMP.tsv | awk 'NR>1{...}'

Thanks a lot in advance, George.

>Solution :

With awk you could do something like this:

tail -n +2 TEMP.tsv | awk '$3=="-1" { count[$2]++ } END {for (word in count) print word, count[word]}'

which counts the occurrences of -1 in the 3rd column grouped by 2nd column value.

Note that tail command removes the header.

$3 in the awk command is checking for third column. Use $4, $5 for the fourth, fifth column and so on..

Counting Pattern Per Column From Line X to Y Using AWK

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Are there any problems with this Boolean based C programming question?

Why can't compare int and size_t in c++

Access attribute and method names with slashes or asterisks defined in Hy from Python

Printing out dictionary without whitespaces

Why are the first set of characters only replaced?

Excel – Index/Match to Parse Column Data Until Desired Value Shows

Keep Up to Date with the Most Important News

Counting Pattern Per Column From Line X to Y Using AWK

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Are there any problems with this Boolean based C programming question?

Why can't compare int and size_t in c++

Access attribute and method names with slashes or asterisks defined in Hy from Python

Printing out dictionary without whitespaces

Why are the first set of characters only replaced?

Excel – Index/Match to Parse Column Data Until Desired Value Shows

Discover more from Dev solutions