Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

division and logarithmic calculation in AWK

I have three column file like this below. I want to divide column 3 by column 2 (ignoring headers) and print it in column 4. Also, I want to calculate the log2 value of column4 and print it in column5 as shown below.

head my_file.txt
this    is header   
chrX:73829232:-::chrX:73831065:-    76.5382 76.34220209
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887549
chr11:18266979:+::chr11:18269194:+  15.4004 1558.282058

I am trying by awk, is giving less output and repeated lines.

awk -v OFS='\t' 'FNR > 1 {$4 = $3 / $2}1' my_file.txt |awk -F"\t" 'FNR > 1{a = log($4)/log(2); print $0"\t" a} OFS="\t"'
awk: cmd. line:1: (FILENAME=my_file.txt FNR=15) fatal: division by zero attempted
this is header
chrX:73829232:-::chrX:73831065:-    76.5382 76.3422020852288    0.997439    -0.00369948
chrX:73829232:-::chrX:73831065:-    76.5382 76.3422020852288    0.997439
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887548960591    1.03527 0.0500071
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887548960591    1.03527

This is my desired output.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

this is my desired header
chrX:73829232:-::chrX:73831065:-    76.5382 76.34220209 0.9974392145    -0.003699170995
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887549  1.035267985 0.05000426549
chr11:18266979:+::chr11:18269194:+  15.4004 1558.282058 101.1845185 6.66084476

>Solution :

You can try the below command:

awk -v OFS='\t' 'FNR==1 {print $0, "col4", "col5"; next} {if ($2 != 0) {$4 = $3 / $2; $5 = log($4) / log(2)} else {$4 = "NaN"; $5 = "NaN"}} 1' my_file.txt

Basically, the above command:

  • checks for the first row (header) and adds "col4" and "col5" as column names.
  • For the remaining rows, it calculates the division and log2 values if column 2 is not zero; otherwise, it sets "NaN" as the value for columns 4 and 5. By setting the values of columns 4 and 5 to "NaN" when column 2 is zero, the script avoids division by zero errors and provides a clear indication that the result of the calculation is not a valid number.

CODE DEMO

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading