I often find myself wanting to update a data frame based on a separate data frame that has new values for a subset of columns and rows. For example:
library(dplyr)
df_original <- data.frame(
id=c(1,2,3),
name=c("John", "Rose", "Kanaya"),
address=c("100 Street st.", "413 Old St.", "200 Drive Dr.")
)
df_newinfo <- data.frame(id=c(2), address=c("612 New St."))
I want to update row ID 2, changing 413 Old St
to 612 New St.
. Currently the only ways I know to do this involve something like the following, where we do a left join, check for non-NA update values, and then drop the new column.
df_updated <- left_join(df_original, df_newinfo, by = "id", suffix = c("", ".y"))
df_updated <- df_updated %>%
mutate(address = coalesce(address.y, address)) %>%
select(-address.y)
This gets tedious and messy when the number of columns I want to update is large, and only works when I know all the columns explicitly ahead of time. Is there a better way?
>Solution :
You can use dplyr::rows_update()
library(dplyr)
df_original %>%
rows_update(df_newinfo, by = "id")
id name address
1 1 John 100 Street st.
2 2 Rose 612 New St.
3 3 Kanaya 200 Drive Dr.