Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using a custom function with dplyr summarise

I have a data frame which I apply some custom summarization logic to (if maximum value per group is over 100, use the maximum value, otherwise use the average of values per group):

library(tidyverse)

dat <- data.frame(grp = c("a", "a", "a", "b", "b"),
           vals = c(115, 100, 101, 90, 100))

dat %>% 
  group_by(grp) %>% 
  summarise(new_val = case_when(max(vals) >= 100 ~ max(vals),
                                TRUE ~ mean(vals)))

However, I need to reuse the custom summarization logic with different maximum cutoff values but don’t want to hardcode in the summarization logic over and over. How can I create a function to do so?

The following does not work:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sumFunc <- function(max_val) {
  case_when(max(vals) >= max_val ~ max(vals),
            TRUE ~ mean(vals))
}

dat %>% 
  group_by(grp) %>% 
  summarise(new_val = sumFunc(100))

>Solution :

Or this:

sumFunc <- function(col,max_val) {
  case_when(max(col) >= max_val ~ max(col),
            TRUE ~ mean(col))
}

dat %>% 
  group_by(grp) %>% 
  summarise(new_val = sumFunc(vals,100))

  grp   new_val
  <fct>   <dbl>
1 a         115
2 b          95
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading