Python 3.9 and Pandas 1.3.4
So here’s the df:
1 First Name Last Name fullname
2 Freddie Mercury Freddie Mercury
3 John Lennon John Lennon
4 David Bowie David Bowie
5 John Doe
6 Joseph Joseph
7 Jovi Jovi
My piece of code currently just finds the fullname column is just First Name + Last Name.
I’m currently trying to filter for blank entries in the First Name column, Last Name column, and any "John Does" in the fullname column.
Current code:
import pandas as pd
df = pd.read_csv('file.csv', dtype=str, header=0)
df2 = pd.DataFrame(df, columns=['First Name', 'Last Name', 'fullname'])
df['fullname'] = (df[['First Name', 'Last Name']].fillna('').agg(' '.join, axis=1).str.strip().replace('', 'John Doe'))
df_sort = df2.loc[df2['First Name'] == " "] | df2.loc[df2['Last Name'] == " "] | df2.loc[df2['fullname'] == "John Doe"]
df.to_csv('file.csv', index=False)
df_sort.to_csv('missing names.csv', index=False)
Currently I am having the missing names write to a new file and outputs only this:
First Name Last Name fullname
Everything is empty under.
I would like for the output to be:
First Name Last Name fullname
John Doe
Joseph
Jovi
>Solution :
Replace possible missing values to empty string, compare and test if at least one value match in DataFrame.any:
df_sort = df2[df2[['First Name', 'Last Name']].fillna('').eq('').any(axis=1)]
Or if there are missing values use:
df_sort = df2[df2[['First Name', 'Last Name']].isna().any(axis=1)]