Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Print rows which do not match a pattern more than once, in a column of concatenated strings

I would like to get a new file only with the rows that do not match a specific pattern more than once, in second column. That column is composed by different strings joined by ;; so the idea is to get those rows where more than one of those strings are different from PASS.

ID1 PASS;mq;bq
ID2 bq
ID3 PASS
ID4 mq;hj;cigar
ID5 mq;PASS;PASS;PASS
ID6 bq;hj;PASS;PASS

I was trying something like this:

awk '! /PASS/ {print $1,$2}' myfile.tsv

But, actually I also want to print rows containing PASS, if they contain two other elements different from this pattern (in my real file, some rows present more than 15 strings in column 2). In addition, I am not sure how to indicate the "more than once" command. My expected output is this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID1 PASS;mq;bq
ID4 mq;hj;cigar
ID6 bq;hj;PASS;PASS

Do you know how can I achieve it?

>Solution :

With your shown samples please try following awk code.

awk '
!/PASS/ && num=split($2,arr,";")>=2{ print; next }
{
  count=0
  num=split($2,arr,";")
  for(i=1;i<=num;i++){
    if(arr[i]!="PASS"){ count++ }
  }
  if(count>=2){ print; next }
}
'  Input_file
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading