I have two dataframes:
mapping = pd.DataFrame({'idx': ['a','b','c'], 'val': [1, 2, 3]})
obs = pd.DataFrame({'obs': ['a','c','a','a','b','d']})
I would like for all observations in obs, that are present in mappings idx column, to get the value of mappings val column. If the value does not exist in mappings idx column, it should be discarded.
That is, I would like to end up with the following dataframe:
obs_mapped = pd.DataFrame({'obs': [1,3,1,1,2]})
Is there a handy way to do this with pandas?
>Solution :
Use Series.map with DataFrame.dropna:
out = (obs.assign(obs = obs['obs'].map(mapping.set_index('idx')['val']))
.dropna(subset=['obs'])
.astype(mapping.val.dtype))
print (out)
obs
0 1
1 3
2 1
3 1
4 2
Or if use DataFrame.merge need multiple rename with sorting by original indices converted to column:
out = (obs.rename(columns={'obs':'idx'}).reset_index()
.merge(mapping)
.rename(columns={'val':'obs'})
.sort_values('index', ignore_index=True)[['obs']])
print (out)
obs
0 1
1 3
2 1
3 1
4 2