Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Attaching rows together if they share same id and creating new columns in R

Let’s suppose we have the dataframe below:

df <- read.table(header=T, text=
'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

How can I rearrange this dataframe to make it look like the following :

ID  ATM MEN1 BRCA1  RAD51C  CHEK2   MUTYH   BRCA2
1    3    1                 
2             3       2                       2
3                             1     
4                                      1      3

Please note that each row is now a unique case and the column Type was used to provide the values for the new created columns.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Your data is long/tidy. You want it to be wide. There are many functions to do this in R. A commonly used one is tidyr::pivot_wider(), which I demonstrate below:

library(tidyverse)


df <- read.table(header=T, text=
                   'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

# Blank cells will be NA
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene,
              values_from = Type) 
#> # A tibble: 4 × 8
#>      ID   ATM  MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <int> <int> <int>  <int> <int> <int> <int>
#> 1     1     3     1    NA     NA    NA    NA    NA
#> 2     2    NA    NA     3      2     2    NA    NA
#> 3     3    NA    NA    NA     NA    NA     1    NA
#> 4     4    NA    NA    NA     NA     3    NA     1

# Blank cells as empty strings ("")
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene, 
              values_from = Type, 
              values_fn = as.character, 
              values_fill = "")
#> # A tibble: 4 × 8
#>      ID ATM   MEN1  BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <chr> <chr> <chr> <chr>  <chr> <chr> <chr>
#> 1     1 "3"   "1"   ""    ""     ""    ""    ""   
#> 2     2 ""    ""    "3"   "2"    "2"   ""    ""   
#> 3     3 ""    ""    ""    ""     ""    "1"   ""   
#> 4     4 ""    ""    ""    ""     "3"   ""    "1"

Created on 2022-05-23 by the reprex package (v2.0.1)

EDIT: Second solution simplified in line with @DarrenTsai’s comment

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading