I have a longitudinal survey dataset. I want to transform from long to wide so I can determine discrepancies from the first wave to the second wave. I am struggling to transform in R, as this was relatively straightforward coming from a Stata user.
I have a ID and time column with two different response columns. I want the transformed response columns to take the value of the time column in a wide format, but struggling with the code. I get a resulting dataframe with NAs.
df <- tibble(PIN = c(1001, 1001, 1002, 1002), time = c(1, 2, 1, 2), age = c(84, 86, 22, 24), height = c(58, 58, 60, 62))
df_wide <- reshape(df,
timevar=c("time"),
idvar=c("PIN"),
dir="wide")
Input:
| id | time | age | height |
|---|---|---|---|
| 1001 | 1 | 84 | 58 |
| 1001 | 2 | 86 | 58 |
| 1002 | 1 | 22 | 60 |
| 1002 | 2 | 24 | 62 |
Desired Output:
| id | age_t1 | age_t2 | height_t1 | height_t2 |
|---|---|---|---|---|
| 1001 | 84 | 86 | 58 | 58 |
| 1002 | 22 | 24 | 60 | 62 |
Actual Output:
| id | age.c(1,2) | height.c(1,2) |
|---|---|---|
| 1001 | NA | NA |
| 1002 | NA | NA |
>Solution :
pivot_wider may be easier
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))
-output
# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62
With reshape from base R, it may need a sequence column
out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62