Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I replace and add specific numbers in a string in a Pandas DataFrame?

I am currently trying to clean a column of data, which contains the phone numbers of users. The phone numbers are not consistent in their format and need to be standardised.

For example:

import pandas as pd

data = {'Name': ['John', 'Dom', 'Jack', 'Sam', 'Fred', 'Harvey', 'Toby'],
        'Phone': ['+49(0) 047905356', '(0161) 496 0674', '239.711.3836', '02984 08192', 
        '(0306) 999 0871', '0121x496x0225', '+44047905356']}

df = pd.DataFrame(data)

Now I’ve tried to use the following code to remove the special characters:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['Phone'] = df['Phone'].replace('\W','', regex=True)

This works, however, I want to replace the numbers that only contain a + sign followed by the code with ‘0’ to achieve the following:

Example of expected outputs:

Input: '+49(0) 047905356' | Expected: '047905356'

Input: '+44047905356'| Expected: '047905356'

But then I also want numbers without a ‘0’ at the beginning to include one, for example:

Input: '239.711.3836' | Expected: '02397113836'

>Solution :

You can use requlare expression to achieve the desired result.

import re

import pandas as pd 
data = {'Name': ['John', 'Dom', 'Jack', 'Sam', 'Fred', 'Harvey', 'Toby'],
'Phone': ['+49(0) 047905356', '(0161) 496 0674', '239.711.3836', '02984 08192',
'(0306) 999 0871', '0121x496x0225', '+44047905356']}
df = pd.DataFrame(data)

data = {'Name': ['John', 'Dom', 'Jack', 'Sam', 'Fred', 'Harvey', 'Toby'],
        'Phone': ['+49(0) 047905356', '(0161) 496 0674', '239.711.3836', '02984 08192', 
        '(0306) 999 0871', '0121x496x0225', '+44047905356']}
df['Phone'] = df['Phone'].replace('\D', '', regex=True)

df.loc[df['Phone'].str.startswith('+'), 'Phone'] = '0' + df['Phone'].str[1:]

df.loc[~df['Phone'].str.startswith('0'), 'Phone'] = '0' + df['Phone']

df['Phone'] = df['Phone'].str[:2] + '.' + df['Phone'].str[2:4] + '.' + df['Phone'].str[4:]

Output:

     Name            Phone
0    John  04.90.047905356
1     Dom    01.61.4960674
2    Jack    02.39.7113836
3     Sam     02.98.408192
4    Fred    03.06.9990871
5  Harvey    01.21.4960225
6    Toby   04.40.47905356
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading