awk sum values of a column, if other columns remain constant

February 14, 2022

I have a file (example.csv) with around 5M rows, each row containing 4 columns (user, day, type, value), like this:

user1,2022-01-01,type1,0.1
user1,2022-01-01,type1,0.9
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user3,2022-01-01,type1,0.3
user3,2022-01-01,type1,0.2
user3,2022-01-01,type1,0.5

I would like to sum the values (4th column in this example) that correspond to the same user, day and type, so the expected output should look like this:

user1,2022-01-01,type1,1.0
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user3,2022-01-01,type1,1.0

I tried something like this to try if it works

awk -F"," '!seen[$1]++;&&!seen[$2]++;&&!seen[$3]++;sum+=$4{print sum}' example.csv

but I am still far from the correct solution.
Any suggestions?

>Solution :

$ awk '
BEGIN {
    FS=OFS=","
}
{
    a[$1 OFS $2 OFS $3]+=$4
}
END {
    for(i in a) 
        print i,sprintf("%.1f",a[i])
}' file

Output:

user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user1,2022-01-01,type1,1.0
user3,2022-01-01,type1,1.0
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0