Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Capture first ocurrence of a price like string

I have the following series

s = pd.Series({0: 'R$ 3,74\n-10%\nR$ 4,15',
 1: 'R$ 12,25',
 2: 'R$ 14,65',})

What I want to do is extract the numbers in their current format and then substitute the ‘,’ for a ".", so I can convert to float values. Unfortunately, I am having trouble with my current capture group, which is basically capturing everything following that format(number,number) a infinite amount of times. Including on the first string.

(\d+\.?\,?\d*)

I would like to limit my capture group, to only capture the first occurrence of such event. I would appreciate other approach’s too.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Wanted result is as follows:

s = pd.Series({0:3.74,
1:12.25,
2:14.65})

>Solution :

You can use:

s.str.extract('(\d+,?\d*)', expand=False).str.replace(',', '.').astype(float)

Or:

import re
s.str.replace('.*(\d+),(\d+).*', r'\1.\2', regex=True, flags=re.DOTALL
              ).astype(float)

Output:

0     3.74
1    12.25
2    14.65
dtype: float64

timings

On 300K items:

s.str.extract('(\d+,?\d*)', expand=False).str.replace(',', '.').astype(float)
# 206 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

s.str.replace('.*(\d+),(\d+).*', r'\1.\2', regex=True, flags=re.DOTALL).astype(float)
# 607 ms ± 34.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading