Extract Numeric info from Pandas column using regex

Advertisements

I am trying to extract the highlighted "numeric information" from a Pandas DataFrame column:

Text
Dimensions: 23"/60
Dimensions: 23" / 60
Dimensions: 48"
Dimensions: 22.5X8.25
Dimensions: 80IN
Dimensions: 567 S
Dimensions: 22.5X8.25
Dimensions: 26INNP
Dimensions: 24" x 55" with pipe 16 x 7

I am using regex and is as follows: regex = r"(\d([^\s]*)\s.\s\d*[^\s])|(\d([^\s])*)"

I am using to the below script to create a new column with all the numeric information extracted from each sentence and combined with ‘_’

df2['Numeric_Info'] = df2['Text'].apply(lambda x: '_'.join([i[0] for i in re.findall(regex, str(x))]))

Can someone tell me what I am doing wrong?

>Solution :

You can use

regex = r'\d+(?:\.\d+)?"?(?:\s*[Xx/.]\s*\d+(?:\.\d+)?"?)?(?:\s*[A-Z]+\b)?'
df2['Numeric_Info'] = df2['Text'].str.findall(regex).str.join('_')

See the regex demo.

Details:

  • \d+(?:\.\d+)? – an int or float value
  • "? – an optional " char
  • (?:\s*[Xx/.]\s*\d+(?:\.\d+)?"?)? – an optional sequence of
    • \s*[Xx/.]\s*X, x, / or . enclosed with zero or more whitespaces
    • \d+(?:\.\d+)? – an int or float value
    • "? – an optional " char
  • (?:\s*[A-Z]+\b)? – an optional sequence of
    • \s* – zero or more whitespaces
    • [A-Z]+ – one or more ASCII uppercase letters
    • \b – a word boundary.

Here is a Pandas test:

import pandas as pd
df = pd.DataFrame({'Text':['No data here','Dimensions: 23"/60', 'Dimensions: 23" / 60', 'Dimensions: 48"', 'Dimensions: 22.5X8.25', 'Dimensions: 80IN', 'Dimensions: 567 S','Dimensions: 22.5X8.25', 'Dimensions: 26INNP','Dimensions: 24" x 55" with pipe 16 x 7']})
regex = r'\d+(?:\.\d+)?"?(?:\s*[Xx/.]\s*\d+(?:\.\d+)?"?)?(?:\s*[A-Z]+\b)?'
df['Numeric_Info'] = df['Text'].str.findall(regex).str.join('_')

Output:

>>> df
                                     Text      Numeric_Info
0                            No data here                  
1                      Dimensions: 23"/60            23"/60
2                    Dimensions: 23" / 60          23" / 60
3                         Dimensions: 48"               48"
4                   Dimensions: 22.5X8.25         22.5X8.25
5                        Dimensions: 80IN              80IN
6                       Dimensions: 567 S             567 S
7                   Dimensions: 22.5X8.25         22.5X8.25
8                      Dimensions: 26INNP            26INNP
9  Dimensions: 24" x 55" with pipe 16 x 7  24" x 55"_16 x 7

Leave a ReplyCancel reply