Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr: How to rearrange this dataframe and create new columns by extracting parts of other columns

Let’s say I have this dataframe

> a
  T..Gene.names Intensity.Mut_125 Intensity.Mut_250 Intensity.Mut.1000 Intensity.Mut.500
1          NCAN               NaN           25.6628            23.8427               NaN
2          AMBP           22.8276           27.0801            25.4740           23.5596
3          CHGB           25.4463           30.0065            27.8181           27.3170
4           APP           25.0346           29.7784            27.0848           24.7314

I need to re-arrange my dataframe so each a$T..Gene.names correspond to a new column. Then, I need a new column called a$sample that extracts the word between Intensity and the number (either 125, 250, 500, 1000 or 2000). An issue is that this word and following number is separated by either . or _ Finally, I need a column named a$volume that correspond to the number. NA should be converted to 0.

I tried several attempts with pivot_longer and pivot_wider but this is above my current skill level.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected output

sample     volume       NCAN         AMBP        CHGB        APP
   Mut        125          0      22.8276     25.4463    25.0346
   Mut        250    25.6638      27.0801     30.0065    29.7784
   Mut        500          0      23.5596     27.3170    24.7314
   Mut       1000    23.8427      25.4740     27.8181    27.0848 

I prefer a dplyr-solution

a <- structure(list(T..Gene.names = c("NCAN", "AMBP", "CHGB", "APP"
), Intensity.Mut_125 = c(NaN, 22.8276, 25.4463, 25.0346), Intensity.Mut_250 = c(25.6628, 
27.0801, 30.0065, 29.7784), Intensity.Mut.1000 = c(23.8427, 25.474, 
27.8181, 27.0848), Intensity.Mut.500 = c(NaN, 23.5596, 27.317, 
24.7314)), row.names = c(NA, 4L), class = "data.frame")

>Solution :

reshape2::recast(a, variable~T..Gene.names,fill = 0) %>%
  separate(variable, c('type','sample', 'volume'))

       type sample volume    AMBP     APP    CHGB    NCAN
1 Intensity    Mut    125 22.8276 25.0346 25.4463  0.0000
2 Intensity    Mut    250 27.0801 29.7784 30.0065 25.6628
3 Intensity    Mut   1000 25.4740 27.0848 27.8181 23.8427
4 Intensity    Mut    500 23.5596 24.7314 27.3170  0.0000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading