Home R mutate() with rowSums()

Questions

R mutate() with rowSums()

November 25, 2021

I want to take a dataframe of participant IDs and the languages they speak, then create a new column which sums all of the languages spoken by each participant. The columns are the ID, each language with 0 = "does not speak" and 1 = "does speak", including a column for "Other", then a separate column which specifies what this other language is, "Other.Lang". I want to subset just the columns which have binary values and create this new column with the sums for each participant.

First here is my dataframe.


      Participant.Private.ID French Spanish Dutch Czech Russian Hebrew Chinese German Italian Japanese Korean Portuguese Other Other.Lang
    1                5133249      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    2                5136082      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    3                5140442      0       1     0     0       0      0       0      0       0        0      0          0     0          0
    4                5141991      0       1     0     0       0      0       0      0       1        0      0          0     0          0
    5                5143476      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    6                5145250      0       0     0     0       0      0       0      0       0        0      0          0     1      Malay
    7                5146081      0       0     0     0       0      0       0      0       0        0      0          0     0          0

Here is the structure:


    str(part_langs)
    
    grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame)
     $ Participant.Private.ID: num [1:7] 5133249 5136082 5140442 5141991 5143476 ...
     $ French                : num [1:7] 0 0 0 0 0 0 0
     $ Spanish               : num [1:7] 0 0 1 1 0 0 0
     $ Dutch                 : num [1:7] 0 0 0 0 0 0 0
     $ Czech                 : num [1:7] 0 0 0 0 0 0 0
     $ Russian               : num [1:7] 0 0 0 0 0 0 0
     $ Hebrew                : num [1:7] 0 0 0 0 0 0 0
     $ Chinese               : num [1:7] 0 0 0 0 0 0 0
     $ German                : num [1:7] 0 0 0 0 0 0 0
     $ Italian               : num [1:7] 0 0 0 1 0 0 0
     $ Japanese              : num [1:7] 0 0 0 0 0 0 0
     $ Korean                : num [1:7] 0 0 0 0 0 0 0
     $ Portuguese            : num [1:7] 0 0 0 0 0 0 0
     $ Other                 : num [1:7] 0 0 0 0 0 1 0
     $ Other.Lang            : chr [1:7] "0" "0" "0" "0" ...
     - attr(*, "groups")= tibble [7 x 2] (S3: tbl_df/tbl/data.frame)
      ..$ Participant.Private.ID: num [1:7] 5133249 5136082 5140442 5141991 5143476 ...

I thought that this should work:


    num <- part_langs %>%
      mutate(num.langs = rowSums(part_langs[2:14]))
    num

However, I keep getting this error message:


    Error: Problem with `mutate()` input `num.langs`.
    x Input `num.langs` can't be recycled to size 1.
    i Input `num.langs` is `rowSums(part_langs[2:14])`.
    i Input `num.langs` must be size 1, not 7.
    i The error occurred in group 1: Participant.Private.ID = 5133249.

What is really strange is that when I try to create a simplified version of this problem to create a reproducible example, it works fine.

First I create a dataset.


    test <- matrix(c(1, 1, 1, 0, 0, "",
                   2, 1, 0, 1, 0, "",
                   3, 0, 0, 0, 1, "Chinese"), ncol = 6, byrow=TRUE)
    
    test<-as.data.frame(test)
    
    colnames(test) <- c("ID", "English", "French", "Italian", "Other", "Other.Lang")
    
    str(test)

Converting binary columns to numeric:


    test$ID <- as.numeric(test$ID)
    test$English <- as.numeric(test$English)
    test$French <- as.numeric(test$French)
    test$Italian <- as.numeric(test$Italian)
    test$Other <- as.numeric(test$Other)

Here’s the same code as above, but with this simplified data set.


    num <- test %>%
      mutate(num.langs = rowSums(test[2:5]))
    num

Here is the output. It works exactly as I want:


    "ID","English","French","Italian","Other","Other.Lang","num.langs"
     1,     1,        1,       0,        0,        "",         2
     2,     1,        0,       1,        0,        "",         2
     3,     0,        0,       0,        1,     "Chinese",     1

So I know I have screwed up somewhere in my real data, but I can’t understand where. Could anyone advise?

>Solution :

The difference in result might be due to the fact that part_langs is a grouped dataframe, as can be seen from the output of strshown in your post:

grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame).

If this is the reason, then ungroup first and rerun your code:

library(dplyr)
part_langs <- part_langs %>% ungroup

rowsum

byMR

Published November 25, 2021

Add a comment

Date sort in JS where the array of objects has the word "week" in it

byMR

November 25, 2021

Questions

Firebase Cloud Functions – Throw Auth Error

byMR

November 25, 2021

Questions

Is it possible to have javascript update an input value on a page before the page loads?

byMR

November 25, 2021

Questions

Adding Scrollbar to Tkinter Textbox Giving Unknown Error?

byMR

November 25, 2021

Questions

Hwo to stop winforms panel from changing the Y coordinate of my added controls when scrolling is used?

byMR

November 25, 2021

Questions

For each ID return the earliest date from the start column and the latest date from the end column in r

byMR

November 25, 2021

R mutate() with rowSums()

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Date sort in JS where the array of objects has the word "week" in it

Firebase Cloud Functions – Throw Auth Error

Is it possible to have javascript update an input value on a page before the page loads?

Adding Scrollbar to Tkinter Textbox Giving Unknown Error?

Hwo to stop winforms panel from changing the Y coordinate of my added controls when scrolling is used?

For each ID return the earliest date from the start column and the latest date from the end column in r

Keep Up to Date with the Most Important News

R mutate() with rowSums()

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Date sort in JS where the array of objects has the word "week" in it

Firebase Cloud Functions – Throw Auth Error

Is it possible to have javascript update an input value on a page before the page loads?

Adding Scrollbar to Tkinter Textbox Giving Unknown Error?

Hwo to stop winforms panel from changing the Y coordinate of my added controls when scrolling is used?

For each ID return the earliest date from the start column and the latest date from the end column in r

Discover more from Dev solutions