Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging dataframes on one column while replacing values with another column

I have two dataframes:

mapping = pd.DataFrame({'idx': ['a','b','c'], 'val': [1, 2, 3]})
obs = pd.DataFrame({'obs': ['a','c','a','a','b','d']})

I would like for all observations in obs, that are present in mappings idx column, to get the value of mappings val column. If the value does not exist in mappings idx column, it should be discarded.

That is, I would like to end up with the following dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

obs_mapped = pd.DataFrame({'obs': [1,3,1,1,2]})

Is there a handy way to do this with pandas?

>Solution :

Use Series.map with DataFrame.dropna:

out = (obs.assign(obs = obs['obs'].map(mapping.set_index('idx')['val']))
          .dropna(subset=['obs'])
          .astype(mapping.val.dtype))
print (out)
   obs
0    1
1    3
2    1
3    1
4    2

Or if use DataFrame.merge need multiple rename with sorting by original indices converted to column:

out = (obs.rename(columns={'obs':'idx'}).reset_index()
          .merge(mapping)
          .rename(columns={'val':'obs'})
          .sort_values('index', ignore_index=True)[['obs']])
print (out)
   obs
0    1
1    3
2    1
3    1
4    2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading