Having the following dataframe:
| name | aaa | bbb |
|---|---|---|
| Mick | None | None |
| Ivan | A | C |
| Ivan-Peter | 1 | None |
| Juli | 1 | P |
I want to get two dataframes.
- One with values, where we have None in columns
aaaand/orbbb, namedfilter_nullsin my code - One where we do not have None at all.
df_outin my code.
This is what I have tried and it does not produce the required dataframes.
import pandas as pd
df_out = {
'name': [ 'Mick', 'Ivan', 'Ivan-Peter', 'Juli'],
'aaa': [None, 'A', '1', '1'],
'bbb': [None, 'C', None, 'P'],
}
print(df_out)
filter_nulls = df_out[df_out['aaa'].isnull()|(df_out['bbb'] is None)]
print(filter_nulls)
df_out = df_out.loc[filter_nulls].reset_index(level=0, drop=True)
print(df_out)
>Solution :
Use:
#DataFrame from sample data
df_out = pd.DataFrame(df_out)
#filter columns names by list and test if NaN or None at least in one row
m = df_out[['aaa','bbb']].isna().any(axis=1)
#OR test both columns separately
m = df_out['aaa'].isna() | df_out['bbb'].isna()
#filter matched and not matched rows
df1 = df_out[m].reset_index(drop=True)
df2 = df_out[~m].reset_index(drop=True)
print (df1)
name aaa bbb
0 Mick None None
1 Ivan-Peter 1 None
print (df2)
name aaa bbb
0 Ivan A C
1 Juli 1 P
Another idea with DataFrame.dropna and filter indices not exist in df2:
df2 = df_out.dropna()
df1 = df_out.loc[df_out.index.difference(df2.index)].reset_index(drop=True)
df2 = df2.reset_index(drop=True)