Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculating row sums in data frame based on column names

I have a data frame with media spending for different media channels:

TV <- c(200,500,700,1000)
Display <- c(30,33,47,55)
Social <- c(20,21,22,23)
Facebook <- c(30,31,32,33)
Print <- c(50,51,52,53)
Newspaper <- c(60,61,62,63)

df_media <- data.frame(TV,Display,Social,Facebook, Print, Newspaper)

My goal is to calculate the row sums of specific columns based on their name.
For example: Per definition Facebook falls into the category of Social, so I want to add the Facebook column to the Social column and just have the Social column left. The same goes for Newspaper which should be added to Print and so on.

The challenge is that the names and the number of columns that belong to one category change from data set to data set, e.g. the next data set could contain Social, Facebook and Instagram which should be all summed up to Social.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

There is a list of rules, which define which media types (column names) belong to each other, but I have to admit that I’m a bit clueless and can only think about a long set of if commands right now, but I hope there is a better solution.

I’m thinking about putting all the names that belong to each other in vectors and use them to find and summarize the relevant columns, but I have no idea, how to execute this.
Any help is appreciated.

>Solution :

You could something along those lines, which allows columns to not be part of every data set (with intersect and setdiff).

  1. Define a set of rules, i.e. those columns that are going to be united/grouped together.

  2. Create a vector d of the remaining columns

  3. Compute the rowSums of every subset of the data set defined in the rules

  4. append the remaining columns

  5. cbind the columns of the list using do.call.

#Rules
rules = list(social = c("Social", "Facebook", "Instagram"),
             printed = c("Print", "Newspaper"))
d <- setdiff(colnames(df_media), unlist(rules)) #columns that are not going to be united

#data frame 
lapply(rules, function(x) rowSums(df_media[, intersect(colnames(df_media), x)])) |>
  append(df_media[, d]) |>
  do.call(cbind.data.frame, args = _)
  social printed   TV Display
1     50     110  200      30
2     52     112  500      33
3     54     114  700      47
4     56     116 1000      55
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading