I have two datasets using r:
df_100= data.frame(siteid=c(seq(1,5,1),conflu=c(3,2,4,5,6),diflu=c(9,2,30,2,5))
df_full= data.frame(siteid=c(seq(1,10,2),conflu=c(6,3,5,2,3),diflu=c(5,9,2,30,7))
If the siteid is the same between df_100 and df_full, I want to take the difference between the conflu columns of each data frame and the same with the diflu columns. I also want that output to be put into a new dataframe, where the siteid is retained and the difference between the columns creates a new column. For example:
df_difference=data.frame(siteid=c(1,3,5), diff_con=c(3,1,-3), diff_dif=c(-4,-18,2))
>Solution :
I don’t follow the calculations to get what you have as the sample output, but based on your description:
library(dplyr)
df_100 <- data.frame(siteid= seq(1,5,1),conflu=c(3,2,4,5,6),diflu=c(9,2,30,2,5))
df_full <- data.frame(siteid = seq(1,10,2),conflu=c(6,3,5,2,3),diflu=c(5,9,2,30,7))
df_difference <- df_100 |>
inner_join(df_full, by = "siteid", suffix = c("_100", "_full")) |>
mutate(
diff_con = conflu_full - conflu_100,
diff_dif = diflu_full - diflu_100
) |>
select(siteid, diff_con, diff_dif)
inner_join will match and keep only the rows with same "siteid". Then use mutate to do the calculations and select the columns you want.