Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to reshape wide format data to long format?

I have a wide format dataset, which contains year variables (yr1, yr2, yr3), and duration variables (yr1_time, yr2_time, yr3_time). yr1 ranges from 2023 to 2025. yr2 and yr3 equals to yr1+1 or +2, respectively.

id<-rep(c(1:20),times=1)
df1<-data.frame(id)
df1$yr1 <- sample(2022:2025, length(df1$id), replace=TRUE)
df1$yr1_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr2 <- df1$yr1+1
df1$yr2_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr3 <- df1$yr1+2
df1$yr3_time <- rnorm(n = 20, mean = 0, sd = 0.6)

print(df1)

#   id  yr1    yr1_time  yr2    yr2_time  yr3    yr3_time
# 1   1 2023 -0.18649844 2024  1.41458053 2025 -1.12031610
# 2   2 2025 -0.01977439 2026  0.68985414 2027 -0.69038076
# 3   3 2023 -0.08855173 2024  0.76039453 2025 -0.36913641
# 4   4 2023  0.28576478 2024 -0.35622031 2025  0.89810598
# 5   5 2024 -0.42831014 2025 -1.28914071 2026  0.44912268
# 6   6 2023 -1.02487195 2024 -0.27391726 2025 -0.62189347
# 7   7 2024  0.16888122 2025 -0.10572896 2026 -0.43966363
# 8   8 2025  0.80350550 2026  0.41403554 2027 -1.41913317
# 9   9 2023  0.59990953 2024 -0.42688373 2025 -0.73899889

How to shape the wide format to the long format? Here is my expected output:

   id  yr    yr_time  
    1 2023 -0.18649844 
    1 2024 1.41458053
    1 2025 -1.12031610
    2 2025 -0.01977439 
    2 2026  0.68985414
    2 2027 -0.69038076



Thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can do this by using pivot_longer() from the tidyr package to convert the data to long format, then several functions from the dplyr package to get to the final format you want.

library(dplyr)
library(tidyr)

df1 |> 
  # Convert data to long format
  pivot_longer(cols = starts_with("yr")) |> 
  mutate(
    # If a row represents a year, assign its `value` to the `year` column
    year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
    # Move all the values up one row (this moves the `yr_time` for each year 
    # into the same row as the corresponding year)
    yr_time = lead(value)
  ) |>
  # Remove the rows that don't contain a year value, since those rows aren't 
  # needed (and now contain the wrong `yr_time` values)
  filter(!is.na(year)) |> 
  # Arrange columns in the order you asked for
  select(id, year, yr_time)

If I haven’t explained each step well, try running each step separately and you should be able to see how it works.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading