I have multiple files containing this information:
sP12345.txt
COMMENT Method: conceptual translation.
FEATURES Location/Qualifiers
source 1..3024
/organism="H"
/isolate="sP12345"
/isolation_source="blood"
/host="Homo sapiens"
/db_xref="taxon:11103"
/collection_date="31-Mar-2014"
/note="genotype: 3"
sP4567.txt
COMMENT Method: conceptual translation.
FEATURES Location/Qualifiers
source 1..3024
/organism="H"
/isolate="sP4567"
/isolation_source="blood"
/host="Homo sapiens"
/db_xref="taxon:11103"
/collection_date="31-Mar-2014"
/note="genotype: 2"
Now I would like to get the /note="genotype: 3" and copy only the number that is after genotype: copy it to a new textfile and print the filename from which is has been taken as column 2.
Expected Output:
3 sP12345
2 sP4567
I tried this code: but it only prints the first column and not the filename:
awk -F'note="genotype: ' -v OFS='\t' 'FNR==1{++c} NF>1{print $2, c}' *.txt > output_file.txt
>Solution :
You may use:
awk '/\/note="genotype: /{gsub(/^.* |"$/, ""); f=FILENAME; sub(/.[^.]+$/, "", f); print $0 "\t" f}' sP*.txt
3 sP12345
2 sP4567