Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to correctly extract Numbers from String using regex in Python?

I am trying to extract Numbers from the string only where it ends with Numbers or Decimals

df = pd.DataFrame({'Names': ["Absolute Neutrophil Count","Absolute Lymphocyte Count 2.9",
                       "Absolute Neutrophil Count 10.2","ESR (Modified Westergren) 8",
                        "Free Triiodothyronine (FT3) 3.59",
                        "Free Triiodothyronine FT4 4.53"]})

df

                         Names
0   Absolute Neutrophil Count
1   Absolute Lymphocyte Count 2.9
2   Absolute Neutrophil Count 10.2
3   ESR (Modified Westergren) 8
4   Free Triiodothyronine (FT3) 3.59
5   Free Triiodothyronine FT4 4.53

Desired Extraction Results:

0  Missing/None
1  2.9
2  10.2
3  8
4  3.59
5  4.53

I was trying below code but that is not giving the desired results.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.iloc[:,0].str.extract(r'^(.*?)\s*(\d\.?\d*)?$') #  '\d+\.\d+'
                           0      1
0   Absolute Neutrophil Count     NaN
1   Absolute Lymphocyte Count     2.9
2   Absolute Neutrophil Count 1   0.2
3   ESR (Modified Westergren)     8
4   Free Triiodothyronine (FT3)   3.59
5   Free Triiodothyronine FT4     4.53

Please use dataframe form of structure in Answer and .extract otherwise sometimes answers here with re and strings work but when I try to apply them on df then it becomes something else.

>Solution :

The following regex pattern seems to be working here:

(\d+(?:\.\d+)?)$

This matches an optional ending integer/decimal. The updated Pandas code would look like:

df.iloc[:,0].str.extract(r'(\d+(?:\.\d+)?)?$')

Here is an explanation of the regex pattern being used:

  • ( capture
    • \d+ an integer
    • (?:\.\d+)? followed by optional decimal component
  • ) close capture
  • $ end of the string
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading