Capture first ocurrence of a price like string

Advertisements

I have the following series

s = pd.Series({0: 'R$ 3,74\n-10%\nR$ 4,15',
 1: 'R$ 12,25',
 2: 'R$ 14,65',})

What I want to do is extract the numbers in their current format and then substitute the ‘,’ for a ".", so I can convert to float values. Unfortunately, I am having trouble with my current capture group, which is basically capturing everything following that format(number,number) a infinite amount of times. Including on the first string.

(\d+\.?\,?\d*)

I would like to limit my capture group, to only capture the first occurrence of such event. I would appreciate other approach’s too.

Wanted result is as follows:

s = pd.Series({0:3.74,
1:12.25,
2:14.65})

>Solution :

You can use:

s.str.extract('(\d+,?\d*)', expand=False).str.replace(',', '.').astype(float)

Or:

import re
s.str.replace('.*(\d+),(\d+).*', r'\1.\2', regex=True, flags=re.DOTALL
              ).astype(float)

Output:

0     3.74
1    12.25
2    14.65
dtype: float64

timings

On 300K items:

s.str.extract('(\d+,?\d*)', expand=False).str.replace(',', '.').astype(float)
# 206 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

s.str.replace('.*(\d+),(\d+).*', r'\1.\2', regex=True, flags=re.DOTALL).astype(float)
# 607 ms ± 34.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Leave a ReplyCancel reply