Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

normal_address() function in R not working as expected

The normal_address() function from the campfin package is not working as I’d expect it to.

I’m trying to use a piece of code like this:

df <- df %>% mutate(clean_add = normal_address(RESERVATION_ADDRESS, abbs=usps_street))

I’m expecting all the words contained in usps_street$full to get replace with it’s abbreviation. It does it most of the time, but not every time.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is this just a bug with normal_address() or am I missing something?
It is causing addresses to not match when I attempt fuzzy matching in a step later one (even though when I look at them they’re clearly the same).

Below are some addresses I haven’t been able to get normalized correctly:

structure(list(RESERVATION_ADDRESS = c("4620 ASH GROVE DRIVE #3B", 
"4001 DE MORADA DRIVE UNIT 118", "734 THOMPSON DRIVE, UNIT A", 
"5917 YORK BRIDGE CIRCLE, AUSTIN, TX", "4140 SUNLAND CIRCLE NW", 
"3951 BELLAIRE DRIVE SOUTH"), RESERVATION_CITY = c("SPRINGFIELD", 
"ODESSA", "LAKE DALLAS", "AUSTIN", "ALBUQUERQUE", "FORT WORTH"
), RESERVATION_STATE = c("IL", "TX", "TX", "TX", "NM", "TX"), 
    RESERVATION_ZIPCODE = c(62711, 79765, 75065, 78749, 87107, 
    76109)), row.names = c(NA, 6L), class = "data.frame")

I’m trying to avoid having to utilize something like `gsub("CIRCLE", "CIR", clean_add) because there could be more instances I’m missing other than "CIRCLE" or "DRIVE".

Is there a better function out there to do this? Or am I just missing something?

>Solution :

Current:

> tt$RESERVATION_ADDRESS
[1] "4620 ASH GROVE DRIVE #3B"            "4001 DE MORADA DRIVE UNIT 118"      
[3] "734 THOMPSON DRIVE, UNIT A"          "5917 YORK BRIDGE CIRCLE, AUSTIN, TX"
[5] "4140 SUNLAND CIRCLE NW"              "3951 BELLAIRE DRIVE SOUTH"   

Probably disered output:

> library(campfin)
> normal_address(tt$RESERVATION_ADDRESS, abbs = usps_street, abb_end = FALSE)
[1] "4620 ASH GRV DR #3B"         "4001 DE MORADA DR UNIT 118"  "734 THOMPSON DR UNIT A"     
[4] "5917 YORK BRG CIR AUSTIN TX" "4140 SUNLAND CIR NW"         "3951 BELLAIRE DR S" 

Meaning, you need to specify abb_end = FALSE, and normal_address() works as expected. If so, then change to:

library(dplyr)
library(campfin)
df = 
  df |> 
  mutate(clean_add = normal_address(RESERVATION_ADDRESS, abbs = usps_street, abb_end = FALSE))

Data:

tt = structure(list(RESERVATION_ADDRESS = c("4620 ASH GROVE DRIVE #3B", 
                                            "4001 DE MORADA DRIVE UNIT 118", "734 THOMPSON DRIVE, UNIT A", 
                                            "5917 YORK BRIDGE CIRCLE, AUSTIN, TX", "4140 SUNLAND CIRCLE NW", 
                                            "3951 BELLAIRE DRIVE SOUTH"), RESERVATION_CITY = c("SPRINGFIELD", 
                                                                                               "ODESSA", "LAKE DALLAS", "AUSTIN", "ALBUQUERQUE", "FORT WORTH"
                                            ), RESERVATION_STATE = c("IL", "TX", "TX", "TX", "NM", "TX"), 
                    RESERVATION_ZIPCODE = c(62711, 79765, 75065, 78749, 87107, 
                                            76109)), row.names = c(NA, 6L), class = "data.frame")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading