Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

perform custom summarise function by group in R

this is my first time posting a question here so be easy on me, let me know if you have tips for making my questions clearer.

I’m trying to initiate a function that will summarize given columns by group ("c", "e"), which I’ve initialized as shown below, but the output seems to ignore the grouping factor when I pass the parameters into the function (df, x). How can I ensure that grouping is respected when applying the custom summary function?

#initialize and relevel factor
dexadf$group <- factor(dexadf$group, levels=c("c", "e"),
                       labels = c("c", "e"))
dexadf$group <- relevel(dexadf$group, ref="c")
attributes(dexadf$group)

My data looks like this, I’ve only included 1 of the columns of interest (fm_bdc3) for sake of simplicity:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> dput(dexadf)
structure(list(participant = c("pt04", "pt75", "pt21", "pt73", 
"pt27", "pt39", "pt43", "pt52", "pt69", "pt49", "pt50", "pt56", 
"pt62", "pt68", "pt22", "pt64", "pt54", "pt79", "pt36", "pt26", 
"pt65", "pt38"), group = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("c", "e"), class = "factor"),  
    fm_bdc3 = c(18.535199635968, 23.52996574649, 17.276246451976, 
    11.526088555461, 23.805048656112, 23.08597823716, 28.691020942436, 
    28.968097858499, 23.378093165331, 22.491725344661, 14.609015054932, 
    19.734914019306, 31.947412973684, 25.152298171274, 12.007356801787, 
    20.836128108938, 22.322230884349, 14.777652101515, 21.389572717608, 
    16.992853675086, 14.138189878472, 17.777235203826)

→ function:

summbygrp <- function(df, x) {
        group_by(df, group) %>%
            summarise(
              count = n(),
              mean = mean(x, na.rm = TRUE),
              sd = sd(x, na.rm = TRUE)
            ) %>%
            mutate(se = sd / sqrt(11),
                   lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
                   upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
                  )
      }

→ function output:

> summbygrp(dexadf, fm_bdc3) 
# A tibble: 2 × 7
  group count  mean    sd    se lower.ci upper.ci
  <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  20.6  5.48  1.65     16.9     24.3
2 e        11  20.6  5.48  1.65     16.9     24.3

As you can see, the summaries of both groups are identical, and I know this not to be true. Can someone identify the error in my code?

Here is the output if I don’t use a function, but I have many columns so this would be pretty tedious to create for each column

group_by(dexadf, group) %>%
    summarise(
      count = n(),
      mean = mean(fm_bdc3, na.rm = TRUE),
      sd = sd(fm_bdc3, na.rm = TRUE)
    ) %>%
    mutate(se = sd / sqrt(11),
           lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
           upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
    )

→ correct ouput:

# A tibble: 2 × 7
  group count  mean    sd    se lower.ci upper.ci
  <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  19.3  5.49  1.66     15.6     23.0
2 e        11  21.9  5.40  1.63     18.2     25.5

>Solution :

library(dplyr)
library(rlang)


dexadf <- data.frame(
  stringsAsFactors = FALSE,
  participant = c("pt04","pt75","pt21","pt73",
                  "pt27","pt39","pt43","pt52","pt69","pt49","pt50",
                  "pt56","pt62","pt68","pt22","pt64","pt54","pt79",
                  "pt36","pt26","pt65","pt38"),
  fm_bdc3 = c(18.535199635968,23.52996574649,
              17.276246451976,11.526088555461,23.805048656112,
              23.08597823716,28.691020942436,28.968097858499,
              23.378093165331,22.491725344661,14.609015054932,19.734914019306,
              31.947412973684,25.152298171274,12.007356801787,
              20.836128108938,22.322230884349,14.777652101515,
              21.389572717608,16.992853675086,14.138189878472,17.777235203826),
  group = as.factor(c("c","e",
                      "e","c","c","e","c","e","c","e","e","c",
                      "e","c","c","e","e","c","e","c","e",
                      "c")),
  sex = as.factor(c("f","m",
                    "m","m","m","m","m","f","m","f","f","f",
                    "f","f","f","f","m","f","m","m","f",
                    "m"))
)


summbygrp <- function(df, x) {
  group_by(df, group) %>%
    summarise(
      count = n(),
      mean = mean({{x}}, na.rm = TRUE),
      sd = sd({{x}}, na.rm = TRUE)
    ) %>%
    mutate(se = sd / sqrt(11),
           lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
           upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
    )
}

summbygrp(dexadf, fm_bdc3)

#> # A tibble: 2 × 7
#>   group count  mean    sd    se lower.ci upper.ci
#>   <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 c        11  19.3  5.49  1.66     15.6     23.0
#> 2 e        11  21.9  5.40  1.63     18.2     25.5

Created on 2022-07-09 by the reprex package (v2.0.1)

Why this works

You actually need to use {{}}, pronounced as curly-curly, from rlang package to make this function work. When you want to pass varibales (i.e. columns of a dataset) as function parameters inside a function which uses dplyr or other tidyverse verbs (like mutate, summarise, group_by etc.), you need to wrap those parameter inside curly-curly like here we did with x. Otherwise the function won’t work as intended and most probably throw erros. Because tidyverse verbs uses NSE (Non-Standard Evaluation). To know more, check out this Programming with dplyr and also I would encourage you to read chapters 17-20 of the book Advanced R

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading