Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Replace cell values using a conditional (normalising string input for gender)

Example data

id Gender Age
1 F 22
2 Fem 18
3 male 45
4 She/Her 30
5 Male 25
6 Non-bianary 26
7 M 18
8 female 20
9 Male 56

I want to be able to standardise this somewhat by replacing all cells with an ‘F’ in them with ‘Female’, and all cells with ‘M’ in them with ‘Male’. I know the first step is to cast the whole column into capitals

df.Gender = df.Gender.str.capitalize()

and I know that I can do it value-by-value with

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['Gender'] = df['Gender'].replace(['F', 'Fem', 'Female'], 'Female')

but is there a way to do this somewhat programmatically?

such as

df.Gender = df.Gender.str.capitalise()

for i in df.Gender:
    if 'F' in str(i):
        #pd.replace call something like...
        df[df.Gender == i] = 'Female'
        #I know that line is very wrong
    elif 'M' in str(i)...

Any help would be much appreciated.

>Solution :

Try using regex:

import re

df["Gender"] = df["Gender"].str.replace(
    r"^F\S*$", "Female", flags=re.I, regex=True
)
print(df)

Prints:

   id       Gender  Age
0   1       Female   22
1   2       Female   18
2   3         male   45
3   4      She/Her   30
4   5         Male   25
5   6  Non-bianary   26
6   7            M   18
7   8       Female   20
8   9         Male   56
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading