Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to loop though dplyr group_by and summarize statements for list of variables

I’m trying to create multiple two may summary tables from a list of variables. Ideally I’d then like to append those tables.

My dataframe looks like this with multiple demographic variables, one ID variable, and one binary variable identifying whether the year was a targeted intervention year:

ID Gender Race AgeCategory TargetYear
1234 Male White 18-25 1
5675 Female Asian 30-35 0

I’m trying to create crosstab summaries for every combination of TargetYear and the demographic variables (TargetYear + Gender, TargetYear + Race, etc.) I know I can likely do this in a loop or function but haven’t had success so far.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I initially tried to approach this with a combination of a for loop and dplyr summary after defining the variables:

library(dplyr)
library(tidyr)

variables <- c("Gender", "Race", "AgeCategory")

agg <- NULL

for (i in variables){
  u <- df %>% group_by(TargetYear, (i)) %>% summarize(n=n_distinct(ID)) %>% ungroup() %>% mutate(total=sum(n)) %>% group_by(TargetYear, (i)) %>% pivot_longer((i))
  return(u)
  bind_rows(agg, u)
}

Where I was expecting:

TargetYear n total name value
1 5 30 Gender Male
0 10 30 Gender Male
1 3 30 Gender Female
0 12 30 Gender Female
1 10 50 Race White
0 20 50 Race White
1 23 50 Race Asian
0 27 50 Race Asian

And instead just got a NULL agg and a u with this:

TargetYear (i) n total
0 Gender 15 30
1 Gender 15 30

I am very new to loops and functions so I’m not sure where I’m going wrong here. Any help would be appreciated! Especially solutions utilizing loops and functions.

>Solution :

No need to loop:

library(tidyr)
library(dplyr)

df |>
  pivot_longer(Gender:AgeCategory) |>
  mutate(total = n_distinct(ID))  |>
  summarize(n = n(), total = unique(total), .by = c(name, value, TargetYear))
#   name        value  TargetYear     n total
#   <chr>       <chr>       <int> <int> <int>
# 1 Gender      Male            1     1     2
# 2 Race        White           1     1     2
# 3 AgeCategory 18-25           1     1     2
# 4 Gender      Female          0     1     2
# 5 Race        Asian           0     1     2
# 6 AgeCategory 30-35           0     1     2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading