I have a bunch of sentences where additional white spaces are presented in every pair of brackets/parentheses/braces. Some of the brackets/parentheses/braces overlap with each other, which is giving me problems. e.g.:
[in]: sentence = '{ ia } ( { fascia } antebrachii ). Genom att aponeurosen fäster i armb'
[in]: pattern = r'(\s([?,.!"]))|(?<=\{|\[|\()(.*?)(?=\)|\]|\})'
[in]: re.sub(pattern, lambda x: x.group().strip(), sentence)
[out]: '{ia} ({ fascia} antebrachii ). Genom att aponeurosen fäster i armb'
As shown here, I have failed to remove the unnecessary white spaces in the overlapped brackets/parentheses/braces. How do I cover these overlapping or nested cases? Thanks.
Expected output:
'{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'
>Solution :
You can replace any space following an opening bracket, or preceding a closing bracket with this regex:
(?<=[\[{(])\s+|\s+(?=[\]})])
(?<=[\[{(])\s+ – looks for spaces preceded by one of [{(
\s+(?=[\]})]) – looks for spaces followed by one of ]})
In python
sentence = '{ ia } ( { fascia } antebrachii ). Genom att aponeurosen fäster i armb'
re.sub(r'\s+(?<=[\[{(])|\s+(?=[\]})])', '', sentence)
Output:
{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb