How i can calculate correlation between two data frames in R using dplyr?

Advertisements

i have two data frames in R say data1 and data2:


a = c(1,2,NA,4,5)
b = c(3,4,5,6,7)
data1 = tibble(a,b);data1



a = c(4,2,4,4,9)
b = c(3,4,4,6,7)
d = c(5,9,3,4,2)
data2 = tibble(a,b,d);data2

i want to calculate the correlation of these two data frames matched columns.Keep in mind that i might have NA in some column vectors and also some columns might not exist in the initial data frame 1 which ideally i want to report NA.How i can do that in R using dplyr ?

>Solution :

Since column a in data1 contains 1 NA, the output should be NA for a. You may do this

library(tidyverse)

a = c(1,2,NA,4,5)
b = c(3,4,5,6,7)
data1 = tibble(a,b);
data1
#> # A tibble: 5 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2     4
#> 3    NA     5
#> 4     4     6
#> 5     5     7

a = c(4,2,4,4,9)
b = c(3,4,4,6,7)
d = c(5,9,3,4,2)
data2 = tibble(a,b,d);data2
#> # A tibble: 5 × 3
#>       a     b     d
#>   <dbl> <dbl> <dbl>
#> 1     4     3     5
#> 2     2     4     9
#> 3     4     4     3
#> 4     4     6     4
#> 5     9     7     2

names(data2) %>% 
  map_dbl(~ {col <- if(is.null(data1[[.x]])){
    rep(NA, dim(data1)[1])
  } else {
    data1[[.x]]
  }
  cor(col, data2[[.x]]) 
  }) %>% set_names(names(data2))
#>         a         b         d 
#>        NA 0.9622504        NA

Created on 2022-07-11 by the reprex package (v2.0.1)

OR usingb stack() will give you a dataframe

names(data2) %>% 
  map_dbl(~ {col <- if(is.null(data1[[.x]])){
    rep(NA, dim(data1)[1])
  } else {
    data1[[.x]]
  }
  cor(col, data2[[.x]]) 
  }) %>% set_names(names(data2)) %>% 
  stack()
#>      values ind
#> 1        NA   a
#> 2 0.9622504   b
#> 3        NA   d

Created on 2022-07-11 by the reprex package (v2.0.1)

Leave a ReplyCancel reply