Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I make tidy data from rows of observations?

I want to make a ggplot boxplot and I think I need to format my data from this:

markerID        V1        V2        V3
       1 0.8636364 0.8409091 0.7954545
       2 0.8863636 0.8409091 0.8409091

to this:

markerID  replicate      rate
       1          1 0.8636364
       1          2 0.8409091
       1          3 0.7954545
       2          1 0.8863636
       2          2 0.8409091
       2          3 0.8409091

For readability I’m only showing part of the data.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I think once I have this format I can group by markerID and then make my boxplot. The number of columns and rows can vary, so I’m not sure how to apply functions like melt() or pivot_longer().

Example data:

structure(list(markerID = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10", "11", "12", "13", "14", "15"), V1 = c(0.863636363636364, 
0.886363636363636, 0.886363636363636, 0.795454545454545, 0.795454545454545, 
0.863636363636364, 0.931818181818182, 0.909090909090909, 0.840909090909091, 
0.863636363636364, 0.886363636363636, 0.795454545454545, 0.818181818181818, 
0.863636363636364, 0.886363636363636), V2 = c(0.840909090909091, 
0.840909090909091, 0.909090909090909, 0.772727272727273, 0.772727272727273, 
0.909090909090909, 0.886363636363636, 0.886363636363636, 0.954545454545455, 
0.75, 0.818181818181818, 0.772727272727273, 0.681818181818182, 
0.863636363636364, 0.840909090909091), V3 = c(0.795454545454545, 
0.840909090909091, 0.886363636363636, 0.818181818181818, 0.818181818181818, 
0.795454545454545, 0.818181818181818, 0.863636363636364, 0.818181818181818, 
0.818181818181818, 0.931818181818182, 0.772727272727273, 0.772727272727273, 
0.886363636363636, 0.886363636363636)), class = "data.frame", row.names = c(NA, 
-15L))

>Solution :

An approach using pivot_longer

library(tidyr)

pivot_longer(df, -markerID, names_prefix="V", names_to="replicate", values_to="rate")
# A tibble: 45 × 3
   markerID replicate  rate
   <chr>    <chr>     <dbl>
 1 1        1         0.864
 2 1        2         0.841
 3 1        3         0.795
 4 2        1         0.886
 5 2        2         0.841
 6 2        3         0.841
 7 3        1         0.886
 8 3        2         0.909
 9 3        3         0.886
10 4        1         0.795
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading