I have a file that looks like this:
Sample_ID Population CP026243.1_309 CP026243.1_318 CP026243.1_427 CP026243.1_449 CP026243.1_515
Turbot-BalticSea_01 BalticSea 0 0 0
Turbot-BalticSea_02 BalticSea -1 -1 -1
Turbot-BalticSea_03 BalticSea -1 -1 0
Turbot-BalticSea_04 BalticSea 0 0 -1
Turbot-BalticSea_05 BalticSea -1 -1 0
Turbot-BalticSea_06 BalticSea -1 0 0
Turbot-BalticSea_07 BalticSea -1 -1 -1
Turbot-BalticSea_08 BalticSea 0 0 0
Turbot-BalticSea_09 BalticSea 0 0 1
Turbot-BalticSea_10 BalticSea 0 0 0
Turbot-BalticSea_11 BalticSea -1 -1 1
Turbot-BalticSea_12 BalticSea 0 0 0
Turbot-BalticSea_13 BalticSea 0 0 0
Turbot-BalticSea_14 BalticSea 0 1 -1
Turbot-BalticSea_15 BalticSea 0 0 0
Turbot-BalticSea_16 BalticSea 0 0 0
Turbot-BalticSea_17 BalticSea 0 0 -1
Turbot-Kattegat_01 Kattegat 0 -1 0
Turbot-Kattegat_02 Kattegat -1 -1 0
Turbot-Kattegat_03 Kattegat -1 -1 -1
Turbot-Kattegat_04 Kattegat 0 0 0
Turbot-Kattegat_05 Kattegat -1 -1 -1
Turbot-Kattegat_06 Kattegat -1 -1 -1
I would like to know how I could count the number of "-1" in each column (starting from the 3rd column and from the second row) per population (BalticSea / Kattegat).
I guess AWK would be good for doing this?
cat TEMP.tsv | awk 'NR>1{...}'
Thanks a lot in advance, George.
>Solution :
With awk you could do something like this:
tail -n +2 TEMP.tsv | awk '$3=="-1" { count[$2]++ } END {for (word in count) print word, count[word]}'
which counts the occurrences of -1 in the 3rd column grouped by 2nd column value.
Note that tail command removes the header.
$3 in the awk command is checking for third column. Use $4, $5 for the fourth, fifth column and so on..