I am trying to get the number of reads for my fastq files, and I wanted the output to also include the name of my files. I’ve found a solution online that almost works, but still not getting the right output. Example:
My file names:
12S_C-T1-045_F_filt.fastq.gz
12S_C-T1-PL_F_filt.fastq.gz
...
The code I have found:
for file in ./*.fastq.gz
do
file_name=$(basename -s .fastq $file)
printf "$file_name\t$(cat ${file} | wc -l)/4|bc\n" >> no_reads_12S.txt
done
The output:
12S_C-T1-045_F_filt.fastq.gz 114/4|bc
12S_C-T1-PL_F_filt.fastq.gz 26455/4|bc
...
So, clearly is not doing the calculation right–the numbers are not even correct. How should I fix this? I’ve tried also doing this:
for file in ./*.fastq.gz
do
file_name=$(basename -s .fastq.gz $file)
echo "$file_name"
echo $(zcat $file | wc -l)/4|bc
done
Which works, but then it gives me the filenames and read numbers in separate rows.
Thanks!
>Solution :
Based on the 2nd script, would you please try:
#!/bin/bash
for file in ./*.fastq.gz; do
file_name=$(basename -s .fastq.gz "$file")
printf "%s\t%d\n" "$file_name" "$(echo $(zcat "$file" | wc -l) / 4 | bc)"
done
Or as a one-liner:
for file in ./*.fastq.gz; do file_name=$(basename -s .fastq.gz "$file"); printf "%s\t%d\n" "$file_name" "$(echo $(zcat "$file" | wc -l) / 4 | bc)"; done
BTW the 1st code looks unclear to me because the loop for file in ./*.fastq will match no files while your files have fastq.gz extensions.