Attaching rows together if they share same id and creating new columns in R

May 23, 2022

Let’s suppose we have the dataframe below:

df <- read.table(header=T, text=
'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

How can I rearrange this dataframe to make it look like the following :

ID  ATM MEN1 BRCA1  RAD51C  CHEK2   MUTYH   BRCA2
1    3    1                 
2             3       2                       2
3                             1     
4                                      1      3

Please note that each row is now a unique case and the column Type was used to provide the values for the new created columns.

>Solution :

Your data is long/tidy. You want it to be wide. There are many functions to do this in R. A commonly used one is tidyr::pivot_wider(), which I demonstrate below:

library(tidyverse)


df <- read.table(header=T, text=
                   'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

# Blank cells will be NA
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene,
              values_from = Type) 
#> # A tibble: 4 × 8
#>      ID   ATM  MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <int> <int> <int>  <int> <int> <int> <int>
#> 1     1     3     1    NA     NA    NA    NA    NA
#> 2     2    NA    NA     3      2     2    NA    NA
#> 3     3    NA    NA    NA     NA    NA     1    NA
#> 4     4    NA    NA    NA     NA     3    NA     1

# Blank cells as empty strings ("")
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene, 
              values_from = Type, 
              values_fn = as.character, 
              values_fill = "")
#> # A tibble: 4 × 8
#>      ID ATM   MEN1  BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <chr> <chr> <chr> <chr>  <chr> <chr> <chr>
#> 1     1 "3"   "1"   ""    ""     ""    ""    ""   
#> 2     2 ""    ""    "3"   "2"    "2"   ""    ""   
#> 3     3 ""    ""    ""    ""     ""    "1"   ""   
#> 4     4 ""    ""    ""    ""     "3"   ""    "1"

^{Created on 2022-05-23 by the reprex package (v2.0.1)}

EDIT: Second solution simplified in line with @DarrenTsai’s comment