I have a string in which I want to get out the city, in this example it would be ‘Elland Rd’ and ‘Leeds’.
mystring = "0000\" club_info=\"Elland Rd, Leeds\" Pitch=\"100x50\""
city = gsub(".* club_info=\"(.*),(.+)\.*", "\\2", mystring) #cant get this part to work
My theory behind getting the city is to search for everything after the comma and up until the backslash but I cant seem to get it to recognize the backslash
>Solution :
I prefer strcapture to extract multiple patterns vice repeated gsubing, how about this?
strcapture('.*club_info="([^"]+),([^"]+)".(.*)', mystring, list(x1="", x2="", x3=""))
# x1 x2 x3
# 1 Elland Rd Leeds Pitch="100x50"
(It was not required to include the Pitch= in there, but I thought you might use it since it appears you’re doing reductive gsubing.)
FYI, x2 here has a leading space; it could be handled in the regex, but if you are not 100% positive it’s in all cases, then it might be simpler to add trimws(.), as in
strcapture('.*club_info="([^"]+),([^"]+)".(.*)', mystring, list(x1="", x2="", x3="")) |>
lapply(trimws)
# $x1
# [1] "Elland Rd"
# $x2
# [1] "Leeds"
# $x3
# [1] "Pitch=\"100x50\""
In this case it does drop from a data.frame to a list, but I’m not certain you need a frame, a named list should suffice. If you really want it as a frame — and many of my use-cases really prefer that — just add |> as.data.frame() to the pipe.