Home Print rows which do not match a pattern more than once, in a column of concatenated strings

Questions

Print rows which do not match a pattern more than once, in a column of concatenated strings

August 24, 2022

I would like to get a new file only with the rows that do not match a specific pattern more than once, in second column. That column is composed by different strings joined by ;; so the idea is to get those rows where more than one of those strings are different from PASS.

ID1 PASS;mq;bq
ID2 bq
ID3 PASS
ID4 mq;hj;cigar
ID5 mq;PASS;PASS;PASS
ID6 bq;hj;PASS;PASS

I was trying something like this:

awk '! /PASS/ {print $1,$2}' myfile.tsv

But, actually I also want to print rows containing PASS, if they contain two other elements different from this pattern (in my real file, some rows present more than 15 strings in column 2). In addition, I am not sure how to indicate the "more than once" command. My expected output is this:

ID1 PASS;mq;bq
ID4 mq;hj;cigar
ID6 bq;hj;PASS;PASS

Do you know how can I achieve it?

>Solution :

With your shown samples please try following awk code.

awk '
!/PASS/ && num=split($2,arr,";")>=2{ print; next }
{
  count=0
  num=split($2,arr,";")
  for(i=1;i<=num;i++){
    if(arr[i]!="PASS"){ count++ }
  }
  if(count>=2){ print; next }
}
'  Input_file