Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum/return NA when all values are NA

I’m trying to run a function on columns that have NA observations. When all observations are NA I would like it to return NA, but when only a fraction of rows has it, just apply na.rm=T. I’ve seen a few posts showing how to do this (link_1, link_2, link_3), but none of them seem to work for my function and I’m not sure where I’m going wrong.

# data frame
species_1<- c(NA, 10, 40)
species_2<- c(NA, NA, 30)
species_3<- c(NA, NA, NA)
group<- c(1, 1, 1)

df<- data.frame(species_1, species_2, species_3, group)

# function argument
y_true_test<- c(30, 20, 20) 

# function
estimate = function(df, y_true, na.rm=T) {
  
  if (all(is.na(df))) df[NA_integer_] else
  
  sqrt(colSums((t(t(df) - y_true_test))^2, na.rm=T) / 3) / y_true_test * 100
  
}

# run
final<- df %>%
  group_by(group) %>%
  group_modify( ~ as.data.frame.list(estimate(., y_true_test))) #species 3 returns '0' when it should be NA

Any help would be greatly appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The function was checking the NA on the whole dataset columns instead it should be by each column. Here, is an option with across

library(dplyr)
names(y_true_test) <- grep("species", names(df), value = TRUE)
df %>%
   group_by(group) %>% 
   summarise(across(everything(), ~ if(all(is.na(.x))) NA_real_ else
     sqrt(sum((.x - y_true_test)^2, na.rm = TRUE)/n())/
                (y_true_test[cur_column()]) * 100), .groups = 'drop')

-output

# A tibble: 1 × 4
  group species_1 species_2 species_3
  <dbl>     <dbl>     <dbl>     <dbl>
1     1      43.0      28.9        NA

If we want to modify the OP’s function

estimate <- function(df, y_true, narm=TRUE) {
  
  i1 <- colSums(is.na(df)) == nrow(df)
  
  
   out <- sqrt(colSums((t(t(df) - y_true_test))^2,
        na.rm= narm) / 3) / y_true_test * 100
   out[i1] <- NA
   out
  
}

-testing

> df %>%
+   group_by(group) %>%
+   group_modify( ~ as.data.frame.list(estimate(., 
          y_true_test))) 
# A tibble: 1 × 4
# Groups:   group [1]
  group species_1 species_2 species_3
  <dbl>     <dbl>     <dbl>     <dbl>
1     1      43.0      28.9        NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading