Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How I can calculate correlation between each variable within each group in R using dplyr package?

Let’s say i have data frame in R that looks like this :

var2 = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3),rep("F",3),
         rep("H",3),rep("I",3))

y2 = c(-1.23, -0.983, 1.28, -0.268, -0.46, -1.23,
            1.87, 0.416, -1.99, 0.289, 1.7, -0.455,
           -0.648, 0.376, -0.887,0.534,-0.679,-0.923,
           0.987,0.324,-0.783,-0.679,0.326,0.998);length(y2)
group2 = c(rep(1,6),rep(2,6),rep(3,6),rep(1,6))
data2 = tibble(var2,group2,y2)

with output :

# A tibble: 24 × 3
   var2  group2     y2
   <chr>  <dbl>  <dbl>
 1 A          1 -1.23 
 2 A          1 -0.983
 3 A          1  1.28 
 4 B          1 -0.268
 5 B          1 -0.46 
 6 B          1 -1.23 
 7 C          2  1.87 
 8 C          2  0.416
 9 C          2 -1.99 
10 D          2  0.289
11 D          2  1.7  
12 D          2 -0.455
13 E          3 -0.648
14 E          3  0.376
15 E          3 -0.887
16 F          3  0.534
17 F          3 -0.679
18 F          3 -0.923
19 H          1  0.987
20 H          1  0.324
21 H          1 -0.783
22 I          1 -0.679
23 I          1  0.326
24 I          1  0.998

i want to calculate the correlation of each distinct pair in R within each group using dplyr.
Ideally i want the resulted tibble to look like this (the 4th column to contain the values of each correlation pair):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

which ideally must look like this :

group var1 var2 value
1 A B cor(A,B)
1 A H cor(A,H)
1 A I cor(A,I)
1 B H cor(B,H)
1 B I cor(B,I)
1 H I cor(H,I)
2 C D cor(C,D)
3 E F cor(E,F)

How i can do that in R ?
Any help ?

>Solution :

A possible solution:

library(tidyverse)

data2 %>%
  group_by(group2) %>% 
  group_split() %>% 
  map(\(x) x %>% group_by(var2) %>% 
  group_map(~ data.frame(.x[-1]) %>% set_names(.y)) %>% 
  bind_cols() %>% cor %>% 
  {data.frame(row = rownames(.)[row(.)[upper.tri(.)]], 
              col = colnames(.)[col(.)[upper.tri(.)]], 
              corr = .[upper.tri(.)])}) %>% 
  imap_dfr(~ data.frame(group = .y, .x))

#>   group row col       corr
#> 1     1   A   B -0.9949738
#> 2     1   A   H -0.9581357
#> 3     1   B   H  0.9819901
#> 4     1   A   I  0.8533855
#> 5     1   B   I -0.9012948
#> 6     1   H   I -0.9669093
#> 7     2   C   D  0.4690460
#> 8     3   E   F -0.1864518
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading