Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr: correlations with NA

xx <- data.frame(group = rep(1:4, each=100), a = rnorm(100) , b = rnorm(100))
xx[c(1,14,33), 'b'] = NA

I’m trying to calculate correlations by group but I’m getting an error when there are NAs.

library(dplyr)
xx %>% group_by(group) %>% summarize(COR=cor(a,b,na.rm=TRUE))
    
Error: Problem with `summarise()` column `COR`.
    i `COR = cor(a, b, na.rm = TRUE)`.
    x unused argument (na.rm = TRUE)
    i The error occurred in group 1: group = 1.
    Run `rlang::last_error()` to see where the error occurred.

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

There is no na.rm argument in cor, it is use. According to ?cor, the usage is

cor(x, y = NULL, use = "everything",
method = c("pearson", "kendall", "spearman"))

use – an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".

library(dplyr)
xx %>%
   group_by(group) %>%
   summarize(COR=cor(a,b, use = "complete.obs"))

-output

# A tibble: 4 × 2
  group   COR
  <int> <dbl>
1     1 0.166
2     2 0.190
3     3 0.190
4     4 0.190

If there are groups with all NA, then use "na.or.complete" (updated data in the comments with groups having only NA)

xx %>%
    group_by(group) %>%
    summarize(COR=cor(a,b, use = "na.or.complete"))
# A tibble: 5 × 2
  group     COR
  <int>   <dbl>
1     1  0.0345
2     2 -0.397 
3     3  0.150 
4     4  0.376 
5     5 NA     

which returns the same with an if/else condition and using "complete.obs"

xx %>%
    group_by(group) %>%
    summarize(COR= if(any(complete.cases(a, b)))
     cor(a,b, use = "complete.obs") else NA_real_)
# A tibble: 5 × 2
  group     COR
  <int>   <dbl>
1     1  0.0345
2     2 -0.397 
3     3  0.150 
4     4  0.376 
5     5 NA   
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading