Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I direct awk to leave floating point numbers unchanged, but append a .0 to integer numbers?

I have a csv file that, apart from the first column, consists of float and integer numbers (no scientific notation). I would like to output another csv file with the same content, but where all numbers are guaranteed to have a decimal point and at least one digit after it. E.g.:

  • 0 becomes 0.0
  • 123 becomes 123.0
  • 3.3 remains the same
  • 333.3333 remains the same

I am using awk like

LC_NUMERIC=C
awk 'NR>=1 { FS=OFS=","; for (i=2; i<=NF; ++i) $i=sprintf("%f", $i) } 1' "input.csv" > output.csv

But the result, with the "%f" format specifier, is that 123 remains 123 (doesn’t become 123.0).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

With "%.1f" I get exactly one digit after the decimal point, and 333.3333 becomes 333.3, instead I want it unchanged.

Is there a format specifier I could use? Or some other way to achieve the same thing?

[EDIT with sample]

test.sh

#!/bin/bash

LC_NUMERIC=C
awk 'NR>=1 { FS=OFS=","; for (i=2; i<=NF; ++i) $i=sprintf("%f", $i) } 1' "input.csv" >output.csv

input.csv

gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.7000522,361.5910696,333.0843759,395.2279977,386.8290979,377.756613
Soltu.DM.01G000070,2,540,0,0,1.83331326,0

output.csv

gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.700052,361.591070,333.084376,395.227998,386.829098,377.756613
Soltu.DM.01G000070,2.000000,540.000000,0.000000,0.000000,1.833313,0.000000

desired output

gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.7000522,361.5910696,333.0843759,395.2279977,386.8290979,377.756613
Soltu.DM.01G000070,2.0,540.0,0.0,0.0,1.83331326,0.0

>Solution :

Using any awk:

$ awk 'BEGIN{ FS=OFS="," } NR>1{ for (i=2; i<=NF; ++i) if ($i ~ /^-?[0-9]+$/) $i=$i".0" } 1' input.csv
gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.7000522,361.5910696,333.0843759,395.2279977,386.8290979,377.756613
Soltu.DM.01G000070,2.0,540.0,0.0,0.0,1.83331326,0.0

I moved the setting of FS back to the BEGIN section as it’s too late in the { ... } part as the first line has already been split on spaces by then. I also changed the NR>=1 test to NR>1 as it’s impossible to reach that part of the code with NR 0 or less and what you actually want to do is skip the first line, the header.

I’m using the comparison /^-?[0-9]+$/ so it ONLY changes integers, it won’t change non-numeric strings, IP addresses, floating point or exponential notation numbers or anything else.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading