I often find myself wanting to update a data frame based on a separate data frame that has new values for a subset of columns and rows. For example:
library(dplyr) df_original <- data.frame( id=c(1,2,3), name=c("John", "Rose", "Kanaya"), address=c("100 Street st.", "413 Old St.", "200 Drive Dr.") ) df_newinfo <- data.frame(id=c(2), address=c("612 New St."))
I want to update row ID 2, changing
413 Old St to
612 New St.. Currently the only ways I know to do this involve something like the following, where we do a left join, check for non-NA update values, and then drop the new column.
df_updated <- left_join(df_original, df_newinfo, by = "id", suffix = c("", ".y")) df_updated <- df_updated %>% mutate(address = coalesce(address.y, address)) %>% select(-address.y)
This gets tedious and messy when the number of columns I want to update is large, and only works when I know all the columns explicitly ahead of time. Is there a better way?
You can use
library(dplyr) df_original %>% rows_update(df_newinfo, by = "id") id name address 1 1 John 100 Street st. 2 2 Rose 612 New St. 3 3 Kanaya 200 Drive Dr.