Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using ddply in combo with weighted.mean in a for loop with dynamic variables

my dataset looks like this:

structure(list(GEOLEV2 = structure(c("768001001", "768001001", 
"768001002", "768001002", "768001006", "768001006", "768001002", 
"768001002", "768001002", "768001002", "768002016", "768002016"
), format.stata = "%9s"), DHSYEAR = structure(c(1988, 1988, 1988, 
1988, 1998, 1998, 1998, 1998, 2013, 2013, 2013, 2013), format.stata = "%9.0g"), 
    v005 = structure(c(1e+06, 1e+06, 1e+06, 1e+06, 1815025, 1815025, 
    1517492, 1517492, 1350366, 1350366, 617033, 617033), format.stata = "%9.0g"), 
    age = structure(c(37, 22, 18, 46, 15, 29, 18, 42, 19, 15, 
    35, 16), format.stata = "%9.0g"), highest_year_edu = structure(c(2, 
    6, NA, NA, 5, NA, 2, 3, 2, NA, 5, 3), format.stata = "%9.0g")), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"), label = "Written by R")

I want to collapse it on a df1$GEOLEV2/df1$DHSYEAR basis, with weighted.mean as the collapsing function. Each variable shall remain with the same name.

I chose the function ddply and when I try it on a single variable, it works:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ddply(df1, ~ df1$GEOLEV2+ df1$DHSYEAR, summarise, age = weighted.mean(age, v005, na.rm = TRUE))

However, when I build the loop, the function returns me an error. My trial was:

df1_collapsed <- ddply(df1, ~ df1$GEOLEV2+ df1$DHSYEAR, summarise, age = weighted.mean(age, v005, na.rm = TRUE))

for (i in names(df1[4,5)) {
  variable <- ddply(df1, ~ df1$GEOLEV2+ df1$DHSYEAR, summarise, i = weighted.mean(i, v005, na.rm = TRUE))
  df1_collapsed <- left_join(df1_collapsed, variable, by = c("df1$GEOLEV2", "df1$DHSYEAR"))
}

and the error is

Error in weighted.mean.default(i, v005, na.rm = TRUE) : 
  'x' and 'w' must have the same length

How can I build the for loop, embedding the variable name in the loop?

>Solution :

In general in R you don’t need loops for grouping and summarising (which you would call collapsing in Stata). You can use dplyr for this type of operation:

df1  %>% 
    group_by(GEOLEV2, DHSYEAR)  %>% 
    summarise(
        across(age:highest_year_edu, ~ weighted.mean(.x, v005, na.rm = TRUE))
    )


# A tibble: 6 x 4
# Groups:   GEOLEV2 [4]
#   GEOLEV2   DHSYEAR   age highest_year_edu
#   <chr>       <dbl> <dbl>            <dbl>
# 1 768001001    1988  29.5              4
# 2 768001002    1988  32              NaN
# 3 768001002    1998  30                2.5
# 4 768001002    2013  17                2
# 5 768001006    1998  22                5
# 6 768002016    2013  25.5              4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading