I have a long dataset with responses at time 1 and time 2.
I am trying to join two other dataframes (df1 and df2) to my primary dataframe (df) to create df_join. I’ve tried the following code, but it results in four separate columns: var1.x, var2.x, var1.y, var2.y where I would like just var1 and var2
What is the best approach here?
library(dplyr)
df <- tibble(ID = c(1001, 1001, 1002, 1002), time = c(1, 2, 1, 2), var3 = c("324908", "1231", "23532", "09094")
df1 <- tibble(ID = c(1001, 1002), time = c(1, 1), var1 = c(300, 400), var2 = c(300, 400))
df2 <- tibble(ID = c(1001, 1002), time = c(2, 2), var1 = c(310, 410), var2 = c(320, 420))
df_join <- df %>%
left_join(select(df1, ID, var1, var2), by = c("ID")) %>%
left_join(select(df2, ID, var1, var2), by = c("ID"))
My desired output would be:
df_join_desire <- tibble(ID = c(1001, 1001, 1002, 1002), time = c(1, 2, 1, 2), var1 = c(300, 310, 400, 410), var2 = c(300, 320, 400, 420), var3 = c(324908, 1231, 23532, 09094))
>Solution :
You’re encountering this error because df1 and df2 have the same column names, so you don’t need to left_join them together, you can simply bind them (i.e., putting the new rows of df2 at the bottom of df1).
df_join <- left_join(df, bind_rows(df1, df2))