How to extract numerical value and corresponding qualifier using Python/RegEx

I have a dataframe with strings containing numeric and qualifiers (>, <, <=…). I would like to extract in 2 new columns the numeric value and the qualifier string.
Exception: when the initial string is a range, I would like to extract the Max value in the range.

Please see below my example:

| ID      | initial_value | qualifier | num_value |
|-------- |---------------|-----------|-----------|
| Sample1 | 25.5          |=          | 25.5      |
| Sample2 | = 25.5        |=          | 25.5      |
| Sample3 | >35.5         |>          | 35.5      |
| Sample4 | > 45.5        |>          | 45.5      |
| Sample5 | <=55.5        |<=         | 55.5      |
| Sample6 | <= 65.5       |<=         | 65.5      |
| Sample7 | >>55.5        |>>         | 55.5      |
| Sample8 | 25.0-75.0     |-          | 75.0      |
| Sample9 | 25.0 - 75.0   |-          | 75.0      |

In advance thank you so much for your help.

Cheers

>Solution :

You can use the ([^\s\d.]+)?\s*([\d.]+)$ regex and fillna:

df[['qualifier', 'num_value']] = df['initial_value'].str.extract('([^\s\d.]+)?\s*([\d.]+)$').fillna('=')

output:

        ID initial_value qualifier num_value
0  Sample1          25.5         =      25.5
1  Sample2        = 25.5         =      25.5
2  Sample3         >35.5         >      35.5
3  Sample4        > 45.5         >      45.5
4  Sample5        <=55.5        <=      55.5
5  Sample6       <= 65.5        <=      65.5
6  Sample7        >>55.5        >>      55.5
7  Sample8     25.0-75.0         -      75.0
8  Sample9   25.0 - 75.0         -      75.0

Leave a Reply