Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr summarize with a dynamic number of stats/conditions

I want to summarize my data in different ways, specifically, I want to count how many values are greater or equal than a certain threshold.

I could easily do that with e.g.

library(tidyverse)
mtcars |>
  summarize(test1 = sum(mpg > 15, na.rm = TRUE))

However, how could I use summarize with using several, dynamic such thresholds?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

E.g. with an input like my_thresholds <- c(15, 20), I’d like to get the following ouptut:

  test1 test2
1    26    14

I think one way could be using the thresholds as an argument in purrr::map and then later on I just bind_cols the tow summaries. However, the summarize itself is already wrapped in another purrr::map, i.e. my input is actually a list of data frames and I want to get the summaries for each list element:

input data:

input_data <- mtcars |>
  group_split(cyl)

And then my desired output would be one row per group.

One more note, the number of thresholds should also be dynamic, e.g. in one case I might have two thresholds, in another call I might have 5.

>Solution :

What about something like this?

library(purrr)
input_data |>
  map(\(gp) map_int(my_thresholds, \(x) sum(gp$mpg > x, na.rm = TRUE)))

output

[[1]]
[1] 11 11

[[2]]
[1] 7 3

[[3]]
[1] 8 0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading