I basically want to ‘join’ numbers that should clearly go together. I want to replace the regex match with itself but without any spaces.
I have:
df
a
'Fraxiparine 9 500 IU (anti-Xa)/1 ml'
'Colobreathe 1 662 500 IU inhalačný prášok v tvrdej kapsule'
I want to have:
df
a
'Fraxiparine 9500 IU (anti-Xa)/1 ml'
'Colobreathe 1662500 IU inhalačný prášok v tvrdej kapsule'
I’m using r'\d+\s+\d+\s*\d+' to match the numbers, and I’ve created the following function to remove the spaces within the string:
def spaces(x):
match = re.findall(r'\d+\s+\d+\s*\d+', x)
return match.replace(" ","")
Now I’m having trouble applying that function to the full dataframe, but I also don’t know exactly how to replace the original match with the string without any spaces.
>Solution :
Try using the following code:
def spaces(s):
return re.sub('(?<=\d) (?=\d)', '', s)
df['a'] = df['a'].apply(spaces)
The regex will match:
- any space
- preceeded by a digit
(?<=\d) - and followed by a digit
(?=\d).
Then, the pandas.Series.apply function will apply your function to all rows of your dataframe.
Output:
0 Fraxiparine 9500 IU (anti-Xa)/1 ml
1 Colobreathe 1662500 IU inhalačný prášok v tvrd...