I need to normalize some texts (product descriptions) in regard to the correct usage of .,,,: symbols (no space before and one space after)
The regex I’ve come up with is this:
$variation['DESCRIPTION'] = preg_replace('#\s*([:,.])\s*(?!<br />)#', '$1 ', $variation['DESCRIPTION']);
The problem is that this matches three cases it shouldn’t touch:
- Any decimal number, like 5.5
- Any thousand separator, like 4,500
- A "fixed" phrase in Greek,
ό,τι
Especially for the numeric exception, I know it can be achieved with some negative lookahead/lookbehind but unfortunately I can’t combine them in my current pattern.
This is a fiddle for you to check (the cases that shouldn’t be matched are in lines 2, 3, 4).
Any help will be very much appreciated! TIA
>Solution :
You can add two lookaheads containing lookbehinds:
\s*([:,.])(?!(?<=ό,)τι)(?!(?<=\d.)\d)(?!\s*<br\s*/>)\s*
See the regex demo. Note that I also added \s* to the last lookahead and swapped it with the consuming \s* to fail the match if there is <br/> after any zero or more whitespaces after the :, , or ..
Details
\s*– zero or more whitespaces([:,.])– Group 1: a:,,or.(?!(?<=ό,)τι)– fail the match if the next two chars areτιpreceded withό,(?!(?<=\d.)\d)– fail the match if the next char is a digit preceded with a digit and any char (note that a.is enough since the[:,.]already match the char allowed/required, here, we just need to "jump" over that matched char)(?!\s*<br\s*/>)– a negative lookahead that fails the match if there are zero or more whitespaces,<br, zero or more whitespaces,/>immediately to the right of the current location.\s*– zero or more whitespaces.