Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Dataframe: Sum undefined num columns based on 2 other column matches

I have a dataframe that looks like this:

Name  Fruit    CostA  CostB
Adam  Orange   2       5
Adam  Apple    3       6
Bob   Orange   3       5
Cathy Orange   4       3
Cathy Orange   5       7

dataframe creation:

df=data.frame(Name=c("Adam","Adam","Bob","Cathy","Cathy"),Fruit=c("Orange","Apple","Orange","Orange","Orange"),CostA=c(2,3,3,4,5),CostB=c(7,8,9,3,4))

This dataframe will grow by unspecified number of columns with CostC, CostD, etc. I need to sum/aggregate all columns (CostA, CostB, etc.) when Name and Fruit values match.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have accomplished similar when the cost columns to be summed are known, by using this:

aggregate(cbind(CostA,CostB) ~ Name + Fruit, df, sum)

However, I now need to accomplish without identifying all columns to be summed.

>Solution :

If these are the only columns, use . to specify the rest of the columns in the formula

aggregate(.~ Name + Fruit, df, sum)

If there are other columns as well and wants to include only the cost columns in addition to ‘Name’, ‘Fruit’, subset with select option

aggregate(.~ Name + Fruit, 
    subset(df, selct = c(Name, Fruit, startsWith(names(df), "Cost"))), sum)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading