R pattern for updating a df column based on another df, when present

March 1, 2023

I often find myself wanting to update a data frame based on a separate data frame that has new values for a subset of columns and rows. For example:

library(dplyr)
df_original <- data.frame(
   id=c(1,2,3), 
   name=c("John", "Rose", "Kanaya"), 
   address=c("100 Street st.", "413 Old St.", "200 Drive Dr.")
)
df_newinfo <- data.frame(id=c(2), address=c("612 New St."))

I want to update row ID 2, changing 413 Old St to 612 New St.. Currently the only ways I know to do this involve something like the following, where we do a left join, check for non-NA update values, and then drop the new column.

df_updated <- left_join(df_original, df_newinfo, by = "id", suffix = c("", ".y"))
df_updated <- df_updated %>% 
  mutate(address = coalesce(address.y, address)) %>%
  select(-address.y)

This gets tedious and messy when the number of columns I want to update is large, and only works when I know all the columns explicitly ahead of time. Is there a better way?

>Solution :

You can use dplyr::rows_update()

library(dplyr)

df_original %>%
  rows_update(df_newinfo, by = "id")

  id   name        address
1  1   John 100 Street st.
2  2   Rose    612 New St.
3  3 Kanaya  200 Drive Dr.