I have the following DataFrame :
column1 column2 columns3 column4
0 A 1 2 3.0
1 B 1 2 3.0
2 B 1 2 NaN
3 B 1 2 NaN
I’m trying to delete all rows that have the value "B" in column1 and a blank cell (or a NaN value) in column4.
This does not work:
for row in df.iterrows():
if (df.column1.items() == "B"):
if (df.column4.isnull()):
df.drop()
And this does not work either:
for row in df.iterrows():
if (df.column1.items() == "B") & (df.column4.isna()):
df.drop()
I do not have an error when I run but nothing happens when I print the dataframe.
>Solution :
Use multiple conditions and boolean indexing:
out = df[df['column1'].ne('B') | df['column4'].notna()]
which, according to DeMorgan’s law is equivalent to:
out = df[~(df['column1'].eq('B') & df['column4'].isna())]
Output:
column1 column2 columns3 column4
0 A 1 2 3.0
1 B 1 2 3.0
Intermediates for the first approach:
column1 column2 columns3 column4 col1 ≠ b col4.notna() (col1 ≠ b) OR col4.notna()
0 A 1 2 3.0 True True True
1 B 1 2 3.0 False True True
2 B 1 2 NaN False False False
3 B 1 2 NaN False False False
Intermediates for the second approach:
column1 column2 columns3 column4 col1 == b col4.isna() (col1 == b) AND col4.isna() ~
0 A 1 2 3.0 False True False True
1 B 1 2 3.0 True True False True
2 B 1 2 NaN True False True False
3 B 1 2 NaN True False True False