Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Summarise + case_when with n()

I wonder what I am doing wrong here.

I am trying to use case_when() with summarise() to get a summary for each, depending on the number of rows for each id.

library(dplyr, warn.conflicts = F)
mock <- tibble::tribble(~id, ~name, ~year,
                1, "xy", 2022,
                1, "xyz", 2021,
                2, "aaa", NA,
                3, "xaa", 2021)

mock %>% 
  group_by(id) %>% 
  summarise(
    condition = case_when(
      n() > 1 ~ "problem",
      .default = NA_character_
    ),
    name2 = case_when(
      n() == 1 ~ name,
      .default = NA_character_
    )
  )
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   id [3]
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

Created on 2023-09-09 with reprex v2.0.2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But I would just like to have :

#> # A tibble: 3 × 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

>Solution :

case_when is used to iterate down a column and create a new vector based on the existing values in other columns. That’s not what you are trying to do here. You are trying to conditionally choose a single output based on the group size, which is always a length-1 integer. Effectively, the value of n() gets recycled into a vector of the same length as the group size. If you want the output of summarize to be length one, you should use if and else, not case_when or if_else.

mock %>% 
  group_by(id) %>% 
  summarize(
    condition = if(n() > 1) 'problem' else NA_character_, 
    name2     = if(n() == 1) name else NA_character_
  )
#> # A tibble: 3 x 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     2 <NA>      aaa  
#> 3     3 <NA>      xaa

Created on 2023-09-09 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading