I have large dataset in wide format (1200 obs. of 287 variables) where most variables are time-varying over a 10 years range and a few are constant over time. Time-varying variable labels do not follow a particular pattern except for ending with a year ranging from 2011 to 2020.
I would like to reshape this in a long format but I’m having a very hard time figuring out how to do it in a reasonably quick way.
Here is a simple data frame with comparable characteristics:
#Packages
library(dplyr)
library(tidyr)
#Reproducibility
set.seed(123)
#Generate data frame
wide <- as.data.frame(matrix(runif(n=60, min=0, max=10), nrow=5))
#Naming rows and columns
names(wide) = c("day2011",
"night2011",
"in_between2011",
"day2012",
"night2012",
"in_between2012",
"day2013",
"night2013",
"in_between2013",
"house",
"tree",
"dog"
)
wide$id <- c("red", "blue", "yellow", "black", "green")
I’m slowly working my way to it doing this, but I’m not sure it is sustainable on a large dataset:
long_day <- pivot_longer(data = wide,
cols = starts_with("day"),
#cols_vary = "slowest",
names_to = "year",
values_to = "Day") %>%
select(-(night2011:in_between2013)) %>%
mutate(year = gsub("day", "", year))
long_night <- pivot_longer(data = wide,
cols = starts_with("night"),
#cols_vary = "slowest",
names_to = "year",
values_to = "Night") %>%
select(-(day2011:in_between2013))%>%
mutate(year = gsub("night", "", year))
long_24 <- full_join(long_day, long_night, by = c("id", "year", "house", "dog", "tree"))
>Solution :
You need to write a regex that captures the time variables. The pattern is that they run from 2011-2020. Capture all variables that have those years at the end as the varying variables:
wide %>%
pivot_longer(matches('20(1[1-9]|20)$'),
names_to = c('.value', 'year'),
names_pattern = "(.*?)(\\d+)")
# A tibble: 15 × 8
house tree dog id year day night in_between
<dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 1.39 0.458 2.07 red 2011 2.88 0.456 9.57
2 1.39 0.458 2.07 red 2012 9.00 8.90 7.09
3 1.39 0.458 2.07 red 2013 9.63 4.78 1.43
4 2.33 4.42 1.28 blue 2011 7.88 5.28 4.53
5 2.33 4.42 1.28 blue 2012 2.46 6.93 5.44
6 2.33 4.42 1.28 blue 2013 9.02 7.58 4.15
7 4.66 7.99 7.53 yellow 2011 4.09 8.92 6.78
8 4.66 7.99 7.53 yellow 2012 0.421 6.41 5.94
9 4.66 7.99 7.53 yellow 2013 6.91 2.16 4.14
10 2.66 1.22 8.95 black 2011 8.83 5.51 5.73
11 2.66 1.22 8.95 black 2012 3.28 9.94 2.89
12 2.66 1.22 8.95 black 2013 7.95 3.18 3.69
13 8.58 5.61 3.74 green 2011 9.40 4.57 1.03
14 8.58 5.61 3.74 green 2012 9.55 6.56 1.47
15 8.58 5.61 3.74 green 2013 0.246 2.32 1.52