Let’s say i have data frame in R that looks like this :
var2 = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3),rep("F",3),
rep("H",3),rep("I",3))
y2 = c(-1.23, -0.983, 1.28, -0.268, -0.46, -1.23,
1.87, 0.416, -1.99, 0.289, 1.7, -0.455,
-0.648, 0.376, -0.887,0.534,-0.679,-0.923,
0.987,0.324,-0.783,-0.679,0.326,0.998);length(y2)
group2 = c(rep(1,6),rep(2,6),rep(3,6),rep(1,6))
data2 = tibble(var2,group2,y2)
with output :
# A tibble: 24 × 3
var2 group2 y2
<chr> <dbl> <dbl>
1 A 1 -1.23
2 A 1 -0.983
3 A 1 1.28
4 B 1 -0.268
5 B 1 -0.46
6 B 1 -1.23
7 C 2 1.87
8 C 2 0.416
9 C 2 -1.99
10 D 2 0.289
11 D 2 1.7
12 D 2 -0.455
13 E 3 -0.648
14 E 3 0.376
15 E 3 -0.887
16 F 3 0.534
17 F 3 -0.679
18 F 3 -0.923
19 H 1 0.987
20 H 1 0.324
21 H 1 -0.783
22 I 1 -0.679
23 I 1 0.326
24 I 1 0.998
i want to calculate the correlation of each distinct pair in R within each group using dplyr.
Ideally i want the resulted tibble to look like this (the 4th column to contain the values of each correlation pair):
which ideally must look like this :
| group | var1 | var2 | value |
|---|---|---|---|
| 1 | A | B | cor(A,B) |
| 1 | A | H | cor(A,H) |
| 1 | A | I | cor(A,I) |
| 1 | B | H | cor(B,H) |
| 1 | B | I | cor(B,I) |
| 1 | H | I | cor(H,I) |
| 2 | C | D | cor(C,D) |
| 3 | E | F | cor(E,F) |
How i can do that in R ?
Any help ?
>Solution :
A possible solution:
library(tidyverse)
data2 %>%
group_by(group2) %>%
group_split() %>%
map(\(x) x %>% group_by(var2) %>%
group_map(~ data.frame(.x[-1]) %>% set_names(.y)) %>%
bind_cols() %>% cor %>%
{data.frame(row = rownames(.)[row(.)[upper.tri(.)]],
col = colnames(.)[col(.)[upper.tri(.)]],
corr = .[upper.tri(.)])}) %>%
imap_dfr(~ data.frame(group = .y, .x))
#> group row col corr
#> 1 1 A B -0.9949738
#> 2 1 A H -0.9581357
#> 3 1 B H 0.9819901
#> 4 1 A I 0.8533855
#> 5 1 B I -0.9012948
#> 6 1 H I -0.9669093
#> 7 2 C D 0.4690460
#> 8 3 E F -0.1864518