Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R pattern for updating a df column based on another df, when present

I often find myself wanting to update a data frame based on a separate data frame that has new values for a subset of columns and rows. For example:

library(dplyr)
df_original <- data.frame(
   id=c(1,2,3), 
   name=c("John", "Rose", "Kanaya"), 
   address=c("100 Street st.", "413 Old St.", "200 Drive Dr.")
)
df_newinfo <- data.frame(id=c(2), address=c("612 New St."))

I want to update row ID 2, changing 413 Old St to 612 New St.. Currently the only ways I know to do this involve something like the following, where we do a left join, check for non-NA update values, and then drop the new column.

df_updated <- left_join(df_original, df_newinfo, by = "id", suffix = c("", ".y"))
df_updated <- df_updated %>% 
  mutate(address = coalesce(address.y, address)) %>%
  select(-address.y)

This gets tedious and messy when the number of columns I want to update is large, and only works when I know all the columns explicitly ahead of time. Is there a better way?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use dplyr::rows_update()

library(dplyr)

df_original %>%
  rows_update(df_newinfo, by = "id")

  id   name        address
1  1   John 100 Street st.
2  2   Rose    612 New St.
3  3 Kanaya  200 Drive Dr.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading