Home Remove all words after a specific word including the specific word from a column r

Questions

Remove all words after a specific word including the specific word from a column r

May 3, 2023

I’m looking to remove all words and characters that appear after specific words that appear in a column in my data.

This is what my data looks like.
A specific example what I like to accomplish is that I like to remove any word after "aka" and "dba" and the words "aka" and "dba" from my data.

	Number.of.Workers	company_name
5195	82	valley ho hotels aka kings inn
5196	82	aluminum precision products
5197	79	levity of brea dba brea improv
5198	79	crunch
5199	71	comedy club of los angeles dba hollywood improv
5200	65	andre-boudin bakeries inc dba boudin

structure(list(Number.of.Workers = c("82", "82", "79", "79", 
"71", "65", "62", "58", "56", "53", "49"), company_name = c("valley ho hotels aka kings inn", 
"aluminum precision products", "levity of brea  dba brea improv", 
"crunch", "comedy club of los angeles  dba hollywood improv", 
"andre-boudin bakeries inc   dba boudin", "comedy club of san jose  dba san jose improv", 
"comedy club of brea  dba ontario improv", "sprout bost ", "culver west lp - playa provisions", 
"faa concord h dba concord honda")), row.names = 5195:5205, class = "data.frame")

>Solution :

You can use sub() as follows:

df$company_name = sub("\\W+?(aka|dba)\\W+.*$", "", df$company_name)

output:

     Number.of.Workers                      company_name
5195                82                  valley ho hotels
5196                82       aluminum precision products
5197                79                    levity of brea
5198                79                            crunch
5199                71        comedy club of los angeles
5200                65         andre-boudin bakeries inc
5201                62           comedy club of san jose
5202                58               comedy club of brea
5203                56                      sprout bost 
5204                53 culver west lp - playa provisions
5205                49                     faa concord h

Note: thanks to @Chris Ruehlemann for pointing out sub vs gsub. The difference is that the former replaces the first occurrence, while the latter replaces all.