Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove numbers only if they have more than two digits

I have a problem. I would like to remove all numbers that have more than 2 digits. What is the best way to do this in pandas?

   customerId                            text
0           1  Hello you should call 46232348
1           2                      What is 42
2           3       Is this a number or 23213
3           4               1 person is there
4           5                    It is 4x4 cm
import pandas as pd
d = {
    "customerId": [1, 2, 3, 4, 5],
    "text": ["Hello you should call 46232348",
             "What is 42",
             "Is this a number or 23213",
             '1 person is there',
             'It is 4x4 cm'],
}
df = pd.DataFrame(data=d)
print(df)
df['text_without_number'] = df['text'].str.replace('\d+', '')

print(df)

What I got

   customerId                            text     text_without_number
0           1  Hello you should call 46232348  Hello you should call 
1           2                      What is 42                What is 
2           3       Is this a number or 23213    Is this a number or 
3           4               1 person is there         person is there
4           5                    It is 4x4 cm              It is x cm

What I want

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   customerId                            text     text_without_number
0           1  Hello you should call 46232348  Hello you should call 
1           2                      What is 42             What is 42  
2           3       Is this a number or 23213    Is this a number or 
3           4               1 person is there      1 person is there
4           5                    It is 4x4 cm           It is 4x4 cm

>Solution :

You can use \d{3,} to get 3 or more digits:

df['text_without_number'] = df['text'].str.replace(r'\s*\d{3,}', '', regex=True)

output:

   customerId                            text    text_without_number
0           1  Hello you should call 46232348  Hello you should call
1           2                      What is 42             What is 42
2           3       Is this a number or 23213    Is this a number or
3           4               1 person is there      1 person is there
4           5                    It is 4x4 cm           It is 4x4 cm
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading