I want to make a ggplot boxplot and I think I need to format my data from this:
markerID V1 V2 V3
1 0.8636364 0.8409091 0.7954545
2 0.8863636 0.8409091 0.8409091
to this:
markerID replicate rate
1 1 0.8636364
1 2 0.8409091
1 3 0.7954545
2 1 0.8863636
2 2 0.8409091
2 3 0.8409091
For readability I’m only showing part of the data.
I think once I have this format I can group by markerID and then make my boxplot. The number of columns and rows can vary, so I’m not sure how to apply functions like melt() or pivot_longer().
Example data:
structure(list(markerID = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15"), V1 = c(0.863636363636364,
0.886363636363636, 0.886363636363636, 0.795454545454545, 0.795454545454545,
0.863636363636364, 0.931818181818182, 0.909090909090909, 0.840909090909091,
0.863636363636364, 0.886363636363636, 0.795454545454545, 0.818181818181818,
0.863636363636364, 0.886363636363636), V2 = c(0.840909090909091,
0.840909090909091, 0.909090909090909, 0.772727272727273, 0.772727272727273,
0.909090909090909, 0.886363636363636, 0.886363636363636, 0.954545454545455,
0.75, 0.818181818181818, 0.772727272727273, 0.681818181818182,
0.863636363636364, 0.840909090909091), V3 = c(0.795454545454545,
0.840909090909091, 0.886363636363636, 0.818181818181818, 0.818181818181818,
0.795454545454545, 0.818181818181818, 0.863636363636364, 0.818181818181818,
0.818181818181818, 0.931818181818182, 0.772727272727273, 0.772727272727273,
0.886363636363636, 0.886363636363636)), class = "data.frame", row.names = c(NA,
-15L))
>Solution :
An approach using pivot_longer
library(tidyr)
pivot_longer(df, -markerID, names_prefix="V", names_to="replicate", values_to="rate")
# A tibble: 45 × 3
markerID replicate rate
<chr> <chr> <dbl>
1 1 1 0.864
2 1 2 0.841
3 1 3 0.795
4 2 1 0.886
5 2 2 0.841
6 2 3 0.841
7 3 1 0.886
8 3 2 0.909
9 3 3 0.886
10 4 1 0.795