Home if match between column ID in two different datasets, then create a new dataset with the difference of other columns r

Questions

if match between column ID in two different datasets, then create a new dataset with the difference of other columns r

byMR

June 2, 2023

I have two datasets using r:

df_100= data.frame(siteid=c(seq(1,5,1),conflu=c(3,2,4,5,6),diflu=c(9,2,30,2,5))
df_full= data.frame(siteid=c(seq(1,10,2),conflu=c(6,3,5,2,3),diflu=c(5,9,2,30,7))

If the siteid is the same between df_100 and df_full, I want to take the difference between the conflu columns of each data frame and the same with the diflu columns. I also want that output to be put into a new dataframe, where the siteid is retained and the difference between the columns creates a new column. For example:

df_difference=data.frame(siteid=c(1,3,5), diff_con=c(3,1,-3), diff_dif=c(-4,-18,2))

>Solution :

I don’t follow the calculations to get what you have as the sample output, but based on your description:

library(dplyr)

df_100 <- data.frame(siteid= seq(1,5,1),conflu=c(3,2,4,5,6),diflu=c(9,2,30,2,5))

df_full <- data.frame(siteid = seq(1,10,2),conflu=c(6,3,5,2,3),diflu=c(5,9,2,30,7))

df_difference <- df_100 |> 
  inner_join(df_full, by = "siteid", suffix = c("_100", "_full")) |> 
  mutate(
    diff_con = conflu_full - conflu_100,
    diff_dif = diflu_full - diflu_100
  ) |> 
  select(siteid, diff_con, diff_dif)

inner_join will match and keep only the rows with same "siteid". Then use mutate to do the calculations and select the columns you want.