Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Custom function with dplyr::summarise with conditions

I want to create a function named ratio_function that does the same as the following code:

data = data %>% 
  group_by(ID) %>% 
  summarise(sum_ratio = sum(surface[category == "A"], na.rm = T)/sum(total_area[category == "A"], na.rm = T)*mean(`MEAN`[category == "A"], na.rm = T))

but inside of summarise such as:

data = data %>% 
  group_by(ID) %>% 
  summarise(sum_ratio = ratio_function("A"))

The problem is that surface, total_area and category aren’t recognized as variable name in summarise once they are called in the function.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

When creating a function, you have to add all objects you want to pass inside the function as arguments for the function itself. In your case, your function probably can’t find the columns because the function does not specify them as arguments, therefore they don’t exist inside the function. You have to simply add the variable names as arguments, like this:

ratio_function <- function(surface, total_area, MEAN, category, selected_category = "A") {
  sum(surface[category == "A"], na.rm = T)/sum(total_area[category == selected_category], na.rm = T)*mean(`MEAN`[category == selected_category], na.rm = T)
}

data %>% 
  group_by(ID) %>% 
  summarise(sum_ratio = ratio_function(surface, total_area, MEAN, category, "A"))

In this case, I added the variable names as arguments for the function, but when using the function you can specify different columns to use for each part of your calculation. For example, exchanging surface for another column. This will probably create confusion in the future, and you may want to rewrite your function so that the arguments are more descriptive of what they do instead of simply being the names of the columns you had in your data.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading