Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R replace string in df with partial match in a list

I have a dataframe (df) in R and I want to create a new column (city1_n) that contains a line stored in the list key whenever there is a partial match between city1 and key.
Bellow I have created a little example that should help to visualize my problem.

> dput(df)
structure(list(Country = c("USA", "France", "Italy", "Spain", 
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid", 
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona", 
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA, 
-5L))

> dput(key)
list("Los angeles California", "Paris Île-de-France", "Rome Lazio", 
    "Madrid Comunidad de Madrid ", "Cancun Quintana Roo")

enter image description here

Result
enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any help in R or unix will be appreciated.
Thanks

>Solution :

Use fuzzyjoin::fuzzyjoin:

fuzzyjoin::fuzzy_left_join(df, data.frame(key), by = c("City1" = "key"), match_fun = \(x,y) str_detect(y, x))

  Country       City1                      City2                         key
1     USA Los angeles                   New York      Los angeles California
2  France       Paris                       Lyon         Paris Île-de-France
3   Italy        Rome                       Pisa                  Rome Lazio
4   Spain      Madrid                  Barcelona Madrid Comunidad de Madrid 
5  Mexico      Cancun San Cristobal de las Casas         Cancun Quintana Roo

data

df <- structure(list(Country = c("USA", "France", "Italy", "Spain", 
                           "Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid", 
                                                "Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona", 
                                                                     "San Cristobal de las Casas")), class = "data.frame", row.names = c(NA, 
                                                                                                                                         -5L))

key <- c("Los angeles California", "Paris Île-de-France", "Rome Lazio", 
     "Madrid Comunidad de Madrid ", "Cancun Quintana Roo")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading