Extract from a Dataframe Column a string that the pattern matches in a vector of strings

I have this dataset of columns, one is basically a quote and the Name of the state, below is an example:

df <- tibble(num = c(11,12,13), quote = c("In Ohio, there are plenty of hobos","Georgia, where the peaches are peachy","Oregon, no, we did not die of dysentery"))

I want to create a column that extracts the specific state.

Here’s what I tried:

states <- state.name
df <- df %>% mutate(state = na.omit(as.vector(str_match(quote,states)))[[1]])

Which fetches this error:

Error in `mutate()`:
ℹ In argument: `state = na.omit(as.vector(str_match(quote, states)))[[1]]`.
Caused by error in `str_match()`:
! Can't recycle `string` (size 3) to match `pattern` (size 50).

>Solution :

You need to collapse the state names in one string and then use str_extract to extract the name from it.


df %>% 
  mutate(state = str_extract(quote,str_c(state.name, collapse = "|")))

#    num quote                                   state  
#  <dbl> <chr>                                   <chr>  
#1    11 In Ohio, there are plenty of hobos      Ohio   
#2    12 Georgia, where the peaches are peachy   Georgia
#3    13 Oregon, no, we did not die of dysentery Oregon 

where str_c generates this string.

str_c(state.name, collapse = "|")
[1] "Alabama|Alaska|Arizona|Arkansas|California|Colorado|Connecticut|Delaware|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana|Nebraska|Nevada|New Hampshire|New Jersey|New Mexico|New York|North Carolina|North Dakota|Ohio|Oklahoma|Oregon|Pennsylvania|Rhode Island|South Carolina|South Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|West Virginia|Wisconsin|Wyoming"

Leave a Reply