I have a list of phone numbers in pandas like this:
| Phone Number |
|---|
| 923********* |
| 0923******** |
| 03********** |
| 0923******** |
I want to clean the phone numbers based on two rules
- If the length of string is 11, number should start with ’03’
- If the length of string is 12, number should start with ‘923’
I want to discard all other numbers.
So far I have tried creating two seperate columns by following code:
before_cust['digits'] = before_cust['Information'].str.len()
before_cust['starting'] = before_cust['Information'].astype(str).str[:3]
before_cust.loc[((before_cust['digits'] == 11) & before_cust[before_cust['starting'].str.contains("03")==True]) | ((before_cust['digits'] == 12) & (before_cust[before_cust['starting'].str.contains("923")==True]))]
However this code doesn’t work. Is there a more efficient way to do this?
>Solution :
Create 2 boolean masks for each condition then filter out your dataframe:
# If the length of string is 11, number should start with '03'
m1 = df['Information'].str.len().eq(11) & df['Information'].str.startswith('03')
# If the length of string is 12, number should start with '923'
m2 = df['Information'].str.len().eq(12) & df['Information'].str.startswith('923')
out = df.loc[m1|m2]
print(out)
# Output:
Information
0 923*********
Note: I think it doesn’t work because you use str.contains rather than str.startswith.