Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

awk sum values of a column, if other columns remain constant

I have a file (example.csv) with around 5M rows, each row containing 4 columns (user, day, type, value), like this:

user1,2022-01-01,type1,0.1
user1,2022-01-01,type1,0.9
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user3,2022-01-01,type1,0.3
user3,2022-01-01,type1,0.2
user3,2022-01-01,type1,0.5

I would like to sum the values (4th column in this example) that correspond to the same user, day and type, so the expected output should look like this:

user1,2022-01-01,type1,1.0
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user3,2022-01-01,type1,1.0

I tried something like this to try if it works

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

awk -F"," '!seen[$1]++;&&!seen[$2]++;&&!seen[$3]++;sum+=$4{print sum}' example.csv

but I am still far from the correct solution.
Any suggestions?

>Solution :

$ awk '
BEGIN {
    FS=OFS=","
}
{
    a[$1 OFS $2 OFS $3]+=$4
}
END {
    for(i in a) 
        print i,sprintf("%.1f",a[i])
}' file

Output:

user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user1,2022-01-01,type1,1.0
user3,2022-01-01,type1,1.0
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading