I have a list of columns where three of them relate to year, which identifies the occurrence of that data being collected for that year (1 or -1 means to be collected). I would like to create / manipulate the dataframe to have ‘Year’ as a new variable, which would replicate the other columns if it is to be collected for multiple years. What is the most efficient way to do this using dplyr / tidyverse?
I am aware of perhaps using pivot_longer but I assume I made need also to use case_when or similar. Any help appreciated.
Programme <- c(1, 2, 2, 3, 4)
Function <- c("Inv", "Inv", "Mac", "Inv", "Inv")
Year2020 <- c(1, 1, 1, -1, 1)
Year2021 <- c(1, 1, *NA*, *NA*, *NA*)
Year2022 <- c(*NA*, *NA*, *NA*, -1, -1)
df <- data.frame(Programme, Function, Year2020, Year2021, Year2022)
So what I am trying to produce is this:
Year <- c(2020, 2021, 2020, 2021, 2020, 2020, 2022, 2020, 2022)
Programme <- c(1, 1, 2, 2, 2, 3, 3, 4, 4)
Function <- c("Inv", "Inv", "Inv", "Inv", "Mac", "Inv", "Inv", "Inv", "Inv")
df <- data.frame(Year, Programme, Function)
>Solution :
Using dplyr, tidyr and stringr packages:
library(tidyverse)
df |>
pivot_longer(3:5,
names_to = "Year") |>
filter(value == 1 | value == -1) |>
mutate(Year = str_remove(Year,"^Year")) |>
select(Year, Programme, Function)
Output:
# A tibble: 9 x 3
Year Programme Function
<chr> <dbl> <chr>
1 2020 1 Inv
2 2021 1 Inv
3 2020 2 Inv
4 2021 2 Inv
5 2020 2 Mac
6 2020 3 Inv
7 2022 3 Inv
8 2020 4 Inv
9 2022 4 Inv

