How do I extract a substring that follows a specific keyword in R?

I would like to extract people’s names that come after the words "Administering Provider". The name could be composed of first, middle, and last name (sometimes just first and last). Also if there’s a person title after their name e.g.: Dr. I’m not interested in it

df <- data.frame("id"= c(12, 19, 20), 'comments' = c('APK COMMENTS FOR APK LOG ID (145991): APK ADMINISTERING PROVIDER: LAURA ABE LE\rAPK ORDERING PROVIDER: EMMA COURTIER (CMS:19928)',
                                               'APK LOG ID (45664705): APK Administering Provider: CHASITY MCDANIELS (1972609856:0000034)\rAPK ORDERING PROVIDER: PAUL LAMAR (19785663:19928476)',
                                               'APK ADMINISTERING PROVIDER: JOHN DOE, R.N. (EPIC:107080)\rAPK ORDERING PROVIDER: OHM LOHAN (EPIC:1987)'))

Below is my attempt to to the solution, but clearly it isn’t working:

updated.df <- df %>% 
  mutate(name = sub(".ADMINISTERING PROVIDER:", "", comments, = T),
         name = trimws(gsub("[(].*$","", comments, = T), which = c('both', 'left', 'right')))

>Solution :

You can cut away the irrelevant information before and after using sub(). The latter would be either "," for the academic degree, "(" for what seems to be an ID or the special character "\r". If you encounter different cases you should add them into the pattern argument of the second sub() call.


df$comments %>%
  sub(pattern = ".*ADMINISTERING PROVIDER: ",
      replacement = "", 
      x = ., = TRUE) %>%
  sub(pattern = ",.*| \\(.*|\r.*", 
      replacement = "",
      x = .)

Leave a Reply