In my dataframe (df), I need to:
- For all rows where the value in ‘Column A’ == Yes
- The value in ‘Column B’ should be:
- The value of ‘Column E’ in dataframe ‘df2’ of the row where ‘df2’ ‘Column D’ == ‘df’ ‘Column C’
Currently I have this:
df.loc[df['Column A'] == 'Yes', 'Column B'] = df2.loc[(df2['Column D'] == df['Column C']), 'Column E']
However this results in:
ValueError: Can only compare identically-labeled Series objects
Example of my dataframes:
df:
Column A Column B Column C
Yes 2554072
No 2557766
No 261191
Yes 2600355
Yes 2601388
... ...
df2:
Column D Column E
2554072 8
2601388 4
2604142 6
...
Required results:
df:
Column A Column B Column C
Yes 8 2554072
No 2557766
No 261191
Yes NoMatch 2600355
Yes 4 2601388
... ...
I have tried to change the values in all of these columns to string and fill NaN values in Column C, however the ValueError is persistent. Any suggestions?
>Solution :
You can map and mask with where:
df['Column B'] = (df['Column C']
.map(df2.set_index('Column D')['Column E'])
.where(df['Column A'].eq('Yes'))
)
Or, with boolean indexing:
m = df['Column A'].eq('Yes')
df.loc[m, 'Column B'] = (df.loc[m, 'Column C']
.map(df2.set_index('Column D')['Column E'])
)
Example:
# df
Column A Column C
0 Yes A
1 No B
2 Yes C
3 No D
4 Yes E
5 No F
# df2
Column D Column E
0 D 0
1 E 1
2 F 2
3 A 3
4 B 4
5 C 5
# Output
Column A Column C Column B
0 Yes A 3.0
1 No B NaN
2 Yes C 5.0
3 No D NaN
4 Yes E 1.0
5 No F NaN