I have a tibble in R arranged like this (just the first 20 rows of thousands):
A tibble: 20 Ă— 4
| TF | confidence | target | mor | |
|---|---|---|---|---|
| 1 | Ahr | B | Cyp1a1 | 1 |
| 2 | Ahr | B | Cyp1a2 | 1 |
| 3 | Ahr | B | Cyp1b1 | 1 |
| 4 | Ahr | B | Fos | 1 |
| 5 | Ahr | B | Myc | 1 |
| 6 | Ahr | B | Ugt1a7c | 1 |
| 7 | Ahr | B | Ugt1a6a | 1 |
| 8 | Ahr | B | Ugt1a6b | 1 |
| 9 | Ahr | B | Asap1 | 1 |
| 10 | Ahr | B | Erg | 1 |
| 11 | Ahr | B | Vgll4 | 1 |
| 12 | Ar | A | Abce1 | 1 |
| 13 | Ar | A | Acad12 | 1 |
| 14 | Ar | A | Acad10 | 1 |
| 15 | Ar | A | Ahsg | 1 |
| 16 | Ar | A | Akr1b3 | 1 |
| 17 | Ar | A | Akt1 | 1 |
| 18 | Ar | A | Anapc10 | 1 |
| 19 | Ar | A | Ap2m1 | 1 |
| 20 | Ar | A | Appbp2 | 1 |
and I want to arrange it to look like this:
| TF | Description | Targets | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ahr | None | Cyp1a1 | Cyp1a2 | Cyp1b1 | Fos | Myc | Ugt1a7c | Ugt1a6a | Ugt1a6b | Asap1 | Erg | Vgll4 |
| Ar | None | Abce1 | Acad12 | Acad10 | Ahsg | Akr1b3 | Akr1b3 | Akt1 | Anapc10 | Ap2m1 | Appbp2 |
…The table goes on for thousands of lines. There can be a varying number of targets for a given TF value. How do I create this second table as a dataframe in R given the first table?
Thank you!
>Solution :
library(dplyr)
df %>%
group_by(TF) %>%
summarise(Targets = list(target))
#> TF Targets
#> 1 Ahr Cyp1a1, Cyp1a2, Cyp1b1, Fos, Myc, Ugt1a7c, Ugt1a6a, Ugt1a6b, Asap1, Erg, Vgll4
#> 2 Ar Abce1, Acad12, Acad10, Ahsg, Akr1b3, Akt1, Anapc10, Ap2m1, Appbp2
This results in a so-called list-column (Targets is a list of character vectors), that will work for varying numbers of targets per TF.
With data:
df <- read.table(text =
'TF confidence target mor
Ahr B Cyp1a1 1
Ahr B Cyp1a2 1
Ahr B Cyp1b1 1
Ahr B Fos 1
Ahr B Myc 1
Ahr B Ugt1a7c 1
Ahr B Ugt1a6a 1
Ahr B Ugt1a6b 1
Ahr B Asap1 1
Ahr B Erg 1
Ahr B Vgll4 1
Ar A Abce1 1
Ar A Acad12 1
Ar A Acad10 1
Ar A Ahsg 1
Ar A Akr1b3 1
Ar A Akt1 1
Ar A Anapc10 1
Ar A Ap2m1 1
Ar A Appbp2 1',
header = TRUE)