Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Dplyr summarize rows except when only NAs

This is what my dataframe looks like

I have a dataframe of several columns and several rows per Participant_ID.
I want to sum data for all lines of Participant_ID, to obtain one value per Participant_ID.
The problem is that some columns are empty (all NAs), and for these columns I want to keep NA as a result.
But when I sum with na.rm = T, it transforms the sum of NAs to 0.

I am using :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 df = df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = T)))

How can I exclude columns (after group_by) with only NAs ? Or filter columns (after group_by) that contain at least one numeric value ?

Thanks a lot for your help !!

>Solution :

You just need the function inside across to handle the NA column differently.

Suppose we have this data frame:

df <- data.frame(Participant_ID = c('A', 'A', 'A'),
  a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))

Then we can do

library(dplyr)

df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), 
                   ~ if(all(is.na(.x))) NA_real_ else sum(.x, na.rm = T)))
#> # A tibble: 1 x 4
#>   Participant_ID     a     b     c
#>   <chr>          <int> <dbl> <dbl>
#> 1 A                  6    10    NA

Created on 2022-11-03 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading