Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to extract only unique values from string using regex in Python?

I have this piece of String "Desirable: < 200 Borderline HIgh: 200 - 240 High: > 240" where I want to extract only unique Number or decimal values.

To extract Number,Decimal,- I was using this regex code r'[^0-9.-]+' but it doesn’t return unique values:

import re

check = "Desirable: < 200 Borderline HIgh: 200 - 240 High: > 240"
re.sub(r'[^0-9.-]+', '',check)

output:
200200-240240

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Desired output:
200-240

Please Note: Its important to able to extract Numbers, Decimals,- from the string.

>Solution :

You can extract all the numbers, decimal numbers using:

re.findall(r'-?\d+\.?\d*', check)

Then you can get the unique ones using set() and finally join them using "-".join

Your desired code:

"-".join(set(re.findall(r'-?\d+\.?\d*', check)))

One of the challenges of this code is that the set() doesn’t preserve the order of numbers. If order matters to you can use numpy.unique() instead:

"-".join(np.unique(re.findall(r'-?\d+\.?\d*', check)))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading