Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use dplyr to return the grouped sum of all numeric columns when there are NA values?

I’m was attempting to sum all numeric columns using dplyr’s group_by and summarise functions as below. I didn’t understand the error returned from the summarise function and cannot seem to find a similar example on stack overflow … however after two members pointed out my error in making the example data I found that the code I had to prepared to provide a grouped summary sum report was correct!

    # Dummy data
    a <- c(1, NA, 1, NA, 1, 1)
    b <- c( NA, 1, NA, 1, NA, NA)
    c <- c( 1, 1, 1, NA, 1, 1)
    d <- c( 1, 1, 1, NA, 1, NA)
    e <- c( NA, 1, 1, NA, 1, 1)
    f <- c( 1, NA, 1, NA, 1, 1)
    
# Make a tibble
tmp <- bind_cols(a, b, c, d, e) 
names(tmp) <- c("A", "B", "C", "D", "E")

ID <- c("X", "X", "Y", "Y", "Z", "Z")

tmp <-bind_cols(ID, tmp)
names(tmp)[1] <- "ID"

    # Return a sum report
    tmp %>% 
      group_by(ID) %>% 
      summarise(across(everything(), ~ sum(.x, na.rm = TRUE)))

    # A tibble: 3 × 6
      ID        A     B     C     D     E
      <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 X         1     1     2     2     1
    2 Y         1     1     1     1     1
    3 Z         2     0     2     1     2

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

It’s best to avoid defining a vector with different data types because R will convert the vector to a single data type.

I think you might want to create your data like this:

tmp = tibble(
         ID = c('X', 'X', 'Y', 'Y', 'Z', 'Z'),
         A = c(1, NA, 1, 1, NA, 1),
         B = c(NA, 1, 1, 1, 1, NA),
         C = c(1, NA, 1, 1, 1, 1),
         D = c(NA, 1, NA, NA, NA, NA),
         E = c(1, NA, 1, 1, 1, 1))

And then do:

tmp %>%
  group_by(ID) %>% 
  summarise(across(everything(), ~ sum(.x, na.rm = TRUE)))

To get:

# A tibble: 3 x 6
  ID        A     B     C     D     E
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 X         1     1     1     1     1
2 Y         2     2     2     0     2
3 Z         1     1     2     0     2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading