R Dplyr summarize rows except when only NAs

November 3, 2022

I have a dataframe of several columns and several rows per Participant_ID.
I want to sum data for all lines of Participant_ID, to obtain one value per Participant_ID.
The problem is that some columns are empty (all NAs), and for these columns I want to keep NA as a result.
But when I sum with na.rm = T, it transforms the sum of NAs to 0.

I am using :

 df = df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = T)))

How can I exclude columns (after group_by) with only NAs ? Or filter columns (after group_by) that contain at least one numeric value ?

Thanks a lot for your help !!

>Solution :

You just need the function inside across to handle the NA column differently.

Suppose we have this data frame:

df <- data.frame(Participant_ID = c('A', 'A', 'A'),
  a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))

Then we can do

library(dplyr)

df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), 
                   ~ if(all(is.na(.x))) NA_real_ else sum(.x, na.rm = T)))
#> # A tibble: 1 x 4
#>   Participant_ID     a     b     c
#>   <chr>          <int> <dbl> <dbl>
#> 1 A                  6    10    NA

^{Created on 2022-11-03 with reprex v2.0.2}