I have a csv file that, apart from the first column, consists of float and integer numbers (no scientific notation). I would like to output another csv file with the same content, but where all numbers are guaranteed to have a decimal point and at least one digit after it. E.g.:
- 0 becomes 0.0
- 123 becomes 123.0
- 3.3 remains the same
- 333.3333 remains the same
I am using awk like
LC_NUMERIC=C
awk 'NR>=1 { FS=OFS=","; for (i=2; i<=NF; ++i) $i=sprintf("%f", $i) } 1' "input.csv" > output.csv
But the result, with the "%f" format specifier, is that 123 remains 123 (doesn’t become 123.0).
With "%.1f" I get exactly one digit after the decimal point, and 333.3333 becomes 333.3, instead I want it unchanged.
Is there a format specifier I could use? Or some other way to achieve the same thing?
[EDIT with sample]
test.sh
#!/bin/bash
LC_NUMERIC=C
awk 'NR>=1 { FS=OFS=","; for (i=2; i<=NF; ++i) $i=sprintf("%f", $i) } 1' "input.csv" >output.csv
input.csv
gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.7000522,361.5910696,333.0843759,395.2279977,386.8290979,377.756613
Soltu.DM.01G000070,2,540,0,0,1.83331326,0
output.csv
gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.700052,361.591070,333.084376,395.227998,386.829098,377.756613
Soltu.DM.01G000070,2.000000,540.000000,0.000000,0.000000,1.833313,0.000000
desired output
gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.7000522,361.5910696,333.0843759,395.2279977,386.8290979,377.756613
Soltu.DM.01G000070,2.0,540.0,0.0,0.0,1.83331326,0.0
>Solution :
Using any awk:
$ awk 'BEGIN{ FS=OFS="," } NR>1{ for (i=2; i<=NF; ++i) if ($i ~ /^-?[0-9]+$/) $i=$i".0" } 1' input.csv
gene_name,VarXCRep.1,VarXCRep.2,VarXCRep.3,VarX1Rep.1,VarX1Rep.2,VarX1Rep.3
Soltu.DM.01G000010,360.7000522,361.5910696,333.0843759,395.2279977,386.8290979,377.756613
Soltu.DM.01G000070,2.0,540.0,0.0,0.0,1.83331326,0.0
I moved the setting of FS back to the BEGIN section as it’s too late in the { ... } part as the first line has already been split on spaces by then. I also changed the NR>=1 test to NR>1 as it’s impossible to reach that part of the code with NR 0 or less and what you actually want to do is skip the first line, the header.
I’m using the comparison /^-?[0-9]+$/ so it ONLY changes integers, it won’t change non-numeric strings, IP addresses, floating point or exponential notation numbers or anything else.