Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R mutate() with rowSums()

I want to take a dataframe of participant IDs and the languages they speak, then create a new column which sums all of the languages spoken by each participant. The columns are the ID, each language with 0 = "does not speak" and 1 = "does speak", including a column for "Other", then a separate column which specifies what this other language is, "Other.Lang". I want to subset just the columns which have binary values and create this new column with the sums for each participant.

First here is my dataframe.


      Participant.Private.ID French Spanish Dutch Czech Russian Hebrew Chinese German Italian Japanese Korean Portuguese Other Other.Lang
    1                5133249      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    2                5136082      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    3                5140442      0       1     0     0       0      0       0      0       0        0      0          0     0          0
    4                5141991      0       1     0     0       0      0       0      0       1        0      0          0     0          0
    5                5143476      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    6                5145250      0       0     0     0       0      0       0      0       0        0      0          0     1      Malay
    7                5146081      0       0     0     0       0      0       0      0       0        0      0          0     0          0

Here is the structure:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


    str(part_langs)
    
    grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame)
     $ Participant.Private.ID: num [1:7] 5133249 5136082 5140442 5141991 5143476 ...
     $ French                : num [1:7] 0 0 0 0 0 0 0
     $ Spanish               : num [1:7] 0 0 1 1 0 0 0
     $ Dutch                 : num [1:7] 0 0 0 0 0 0 0
     $ Czech                 : num [1:7] 0 0 0 0 0 0 0
     $ Russian               : num [1:7] 0 0 0 0 0 0 0
     $ Hebrew                : num [1:7] 0 0 0 0 0 0 0
     $ Chinese               : num [1:7] 0 0 0 0 0 0 0
     $ German                : num [1:7] 0 0 0 0 0 0 0
     $ Italian               : num [1:7] 0 0 0 1 0 0 0
     $ Japanese              : num [1:7] 0 0 0 0 0 0 0
     $ Korean                : num [1:7] 0 0 0 0 0 0 0
     $ Portuguese            : num [1:7] 0 0 0 0 0 0 0
     $ Other                 : num [1:7] 0 0 0 0 0 1 0
     $ Other.Lang            : chr [1:7] "0" "0" "0" "0" ...
     - attr(*, "groups")= tibble [7 x 2] (S3: tbl_df/tbl/data.frame)
      ..$ Participant.Private.ID: num [1:7] 5133249 5136082 5140442 5141991 5143476 ...

I thought that this should work:


    num <- part_langs %>%
      mutate(num.langs = rowSums(part_langs[2:14]))
    num

However, I keep getting this error message:


    Error: Problem with `mutate()` input `num.langs`.
    x Input `num.langs` can't be recycled to size 1.
    i Input `num.langs` is `rowSums(part_langs[2:14])`.
    i Input `num.langs` must be size 1, not 7.
    i The error occurred in group 1: Participant.Private.ID = 5133249.

What is really strange is that when I try to create a simplified version of this problem to create a reproducible example, it works fine.

First I create a dataset.


    test <- matrix(c(1, 1, 1, 0, 0, "",
                   2, 1, 0, 1, 0, "",
                   3, 0, 0, 0, 1, "Chinese"), ncol = 6, byrow=TRUE)
    
    test<-as.data.frame(test)
    
    colnames(test) <- c("ID", "English", "French", "Italian", "Other", "Other.Lang")
    
    str(test)

Converting binary columns to numeric:


    test$ID <- as.numeric(test$ID)
    test$English <- as.numeric(test$English)
    test$French <- as.numeric(test$French)
    test$Italian <- as.numeric(test$Italian)
    test$Other <- as.numeric(test$Other)

Here’s the same code as above, but with this simplified data set.


    num <- test %>%
      mutate(num.langs = rowSums(test[2:5]))
    num

Here is the output. It works exactly as I want:


    "ID","English","French","Italian","Other","Other.Lang","num.langs"
     1,     1,        1,       0,        0,        "",         2
     2,     1,        0,       1,        0,        "",         2
     3,     0,        0,       0,        1,     "Chinese",     1

So I know I have screwed up somewhere in my real data, but I can’t understand where. Could anyone advise?

>Solution :

The difference in result might be due to the fact that part_langs is a grouped dataframe, as can be seen from the output of strshown in your post:

grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame). 

If this is the reason, then ungroup first and rerun your code:

library(dplyr)
part_langs <- part_langs %>% ungroup
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading