Summarise + case_when with n()

I wonder what I am doing wrong here.

I am trying to use case_when() with summarise() to get a summary for each, depending on the number of rows for each id.

library(dplyr, warn.conflicts = F)
mock <- tibble::tribble(~id, ~name, ~year,
                1, "xy", 2022,
                1, "xyz", 2021,
                2, "aaa", NA,
                3, "xaa", 2021)

mock %>% 
  group_by(id) %>% 
  summarise(
    condition = case_when(
      n() > 1 ~ "problem",
      .default = NA_character_
    ),
    name2 = case_when(
      n() == 1 ~ name,
      .default = NA_character_
    )
  )
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   id [3]
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

Created on 2023-09-09 with reprex v2.0.2

But I would just like to have :

#> # A tibble: 3 × 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

>Solution :

case_when is used to iterate down a column and create a new vector based on the existing values in other columns. That’s not what you are trying to do here. You are trying to conditionally choose a single output based on the group size, which is always a length-1 integer. Effectively, the value of n() gets recycled into a vector of the same length as the group size. If you want the output of summarize to be length one, you should use if and else, not case_when or if_else.

mock %>% 
  group_by(id) %>% 
  summarize(
    condition = if(n() > 1) 'problem' else NA_character_, 
    name2     = if(n() == 1) name else NA_character_
  )
#> # A tibble: 3 x 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     2 <NA>      aaa  
#> 3     3 <NA>      xaa

Created on 2023-09-09 with reprex v2.0.2

Leave a Reply