How to fix this regex so that with these input strings I get these outputs…
out = re.sub(r"(hs|h.s|h.s.)a m(\W|\b)", r"\1 am\2", out)
print(repr(out))
Input string examples…
#example 1.1
colloquial_hour = "Cerca de las 2: hs a m, hay que salir antes de esas hs a m"
#example 1.2
colloquial_hour = "A medida que avance cerca de la media noche 12: 04 hs a m. Deben ir a las 15 hs a m."
#example 1.3
colloquial_hour = "A mmm... cerca de las 12: h.s a m, hay que salir antes de esas h.s. a m"
#example 1.4
colloquial_hour = "A medida que avance cerca de las 12:04 hs. a m. Deben ir a las 15 h.s a m."
correct outputs:
#correct output for example 1.1
"Cerca de las 2: hs am, hay que salir antes de esas hs a m"
#correct output for example 1.2
"A medida que avance cerca de la media noche 12: 04 hs am. Deben ir a las 15 hs am."
#correct output for example 1.3
"A mmm... cerca de las 12: h.s am, hay que salir antes de esas h.s. a m"
#correct output for example 1.4
"A medida que avance cerca de las 12:04 hs. am. Deben ir a las 15 h.s am."
The logic should work that su will do a numeric value and then an "a m" replace that "a m" substring with this string "am" in the original string.
These would be all the possible cases where you have to replace the substring "a m" with "am"
X a m
X: a m
X: hs a m
X: h.s. a m
X: h.s a m
X: hs. a m
X: a m
X : hs a m
X : h.s. a m
X : h.s a m
X : hs. a m
X hs a m
X h.s. a m
X h.s a m
X hs. a m
#where "X" is a numerical value ("1", "2", "3", "4", "5", "6", ... )
#in all these cases, in which this pattern is met, "a m" must be replaced by "am"
>Solution :
You can search using regex:
(\d\W+)(h\.?s\.?\s+)?a\s+m\b
and replace using:
\1\2am
RegEx Details:
(\d\W+): Match a digit followed by 1+ non-word char in capture group #1(h\.?s\.?\s+)?: Matchhfollowed byswith optional dots after them. This optional group is capture group #2a\s+m\b: Matchafollowed by 1+ whitespaces thenmwith a word boundary