Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Rowwise averaging in Dlyr in R with multiple grouping categories

I have random df with likert scale answers. All columns are questions named as q1,q2,q3,…,q6.

I now have a another data frame that gives me the groups of questions. For example, q1,q2,q3 are in group A, q4 in group B, q5 and q6 in group C.

I want to first calculate the mean (rowwise) of each group. For example, in the new resulted data frame I must have column A with the mean (rowwise) of q1,q2,q3.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So I did:


likert_levels <- c(1,2,3,4,5)
set.seed(42)
library(dplyr)
df <-
  tibble(
    "q1" = sample(likert_levels, 150, replace = TRUE),
    "q2" = sample(likert_levels, 150, replace = TRUE, prob = 5:1),
    "q3" = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
    "q4" = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
    "q5" = sample(c(likert_levels, NA), 150, replace = TRUE),
    "q6" = sample(likert_levels, 150, replace = TRUE, prob = c(1, 0, 1, 1, 0))
  ) %>%
  mutate(across(everything(), ~ factor(.x, levels = likert_levels)))

df
df2 = tibble(categories = c("A","A","A","B","C","C"),
             questions = c("q1","q2","q3","q4","q5","q6"))

df2
df%>%
  mutate(id = row_number())%>%
  tidyr::pivot_longer(!id,names_to = "questions",values_to = "responses")%>%
  left_join(.,df2,by="questions")



df_cor=df%>%
  mutate_if(is.factor,as.double)%>%
  rowwise() %>%
  mutate(QA = mean(c(q1, q2, q3),na.rm=TRUE),
         QB = mean(c(q4),na.rm=TRUE),
         QC = mean(c(q5, q6),na.rm=TRUE))%>%
  select(QA,QB,QC)
df_cor

My question is: because my real life data set contains 100 questions and more than 20 groups, how can I avoid typing in the rowwise mean mutation in dplyr and instead have it be done automatically with a different style of grouping?

>Solution :

You already have the first part of the solution i.e to bring the data in long format. Continue using the long format and calculate the mean for each id and category. If needed you can transform the data in wide format again.

library(dplyr)
library(tidyr)

df %>%
  mutate(id = row_number())%>%
  pivot_longer(!id,names_to = "questions",values_to = "responses")%>%
  left_join(df2,by="questions") %>%
  summarise(value = mean(as.integer(as.character(responses)), na.rm = TRUE), 
           .by = c(id, categories)) %>%
  pivot_wider(names_from = categories, names_prefix = "Q", values_from = value)%>%
  select(-id)

# A tibble: 150 × 3
#      QA    QB    QC
#   <dbl> <dbl> <dbl>
# 1  2.33     2   3  
# 2  3.33     5   3  
# 3  2        3   2.5
# 4  2.33     3   4.5
# 5  3.67     4   3  
# 6  3.33     5   3  
# 7  2.67     5   3  
# 8  2.33     4   2.5
# 9  2.67     4   4  
#10  3.67     5   2.5
# ℹ 140 more rows
# ℹ Use `print(n = ...)` to see more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading