I have a specific dataframe (df1) with some values for a variable called "MP10".
df = data = {
"day": [1, 2, 3, 4, 5, 6, 7],
"month": [1, 1, 4, 4, 7, 7, 7],
"year": [2015, 2015, 2016, 2017, 2017, 2019, 2019,],
"mp10": [10, 15, 20, 5, 7, 14, 18]
}
df1 = pd.DataFrame(df)
df1
But, I have to change the values for NaN for specific days.
And I created another dataframe (df2), to find these specific days.
df = data = {
"day": [2, 5, 7],
"month": [1, 7, 7],
"year": [2015, 2017, 2019,],
"mp10": [np.NaN, np.NaN, np.NaN]
}
df2 = pd.DataFrame(df)
df2
I thought that using pd.merge, I could change the values, but it didn’t work.
How could I pick the NaN from df2 and change them in df1?
>Solution :
You could use a merge with indicator=True to identify the indices to remove:
idx = (df1.reset_index()
.merge(df2, on=['day', 'month', 'year'],
how='left', indicator=True)
.query('_merge == "both"')['index']
)
df1.loc[idx, 'mp10'] = None
Variant using boolean indexing:
mask = (df1.merge(df2, on=['day', 'month', 'year'],
how='left', indicator=True)
['_merge'].eq('both')
.tolist() # merge doesn't maintain the index
)
df1.loc[mask, 'mp10'] = None
Output:
day month year mp10
0 1 1 2015 10.0
1 2 1 2015 NaN
2 3 4 2016 20.0
3 4 4 2017 5.0
4 5 7 2017 NaN
5 6 7 2019 14.0
6 7 7 2019 NaN
Or, if you want a new DataFrame:
out = (df1.merge(df2.assign(mp10=1), on=['day', 'month', 'year'],
how='left', suffixes=(None, '_mask'))
.assign(mp10=lambda x: x['mp10'].mask(x.pop('mp10_mask').eq(1)))
)