Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding specific values from one dataframe into another

I have two separate data frames, where I’d like to add specific columns from the second data frame to the first one but I want their values to match specific rows in the first one, mainly whenever its says ‘red’ in the ‘descriptor’ column and also match for ‘date’ and ‘product’.

Data frame 1:

date   product    descriptor        
jan    product1   blue
feb    product2   green
mar    product1   red
apr    product2   blue
may    product1   red       
jun    product2   red        

Data frame 2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

date   product    val        
mar    product1   x
may    product1   y
jun    product2   z

I want to achieve something like this:

date   product    descriptor  val       
jan    product1   blue
feb    product2   green
mar    product1   red          x
apr    product2   blue
may    product1   red          y
jun    product2   red          z

I’ve tried looking for something similar but couldn’t find anything that exactly matches my question.

>Solution :

You can merge, mask and combine_first:

out = df1.combine_first(df1.merge(df2.drop_duplicates(['date', 'product']),
                                  on=['date', 'product'], how='left')
                           .where(df1['descriptor'].eq('red'))
                       )

Or, align with a MultiIndex:

cols = ['date', 'product']

df1.loc[m, ['val']] = (df2.set_index(cols)
                          .reindex(pd.MultiIndex.from_frame(df1[cols]))
                          .set_axis(df1.index)
                          .where(df1['descriptor'].eq('red'))
                      )

Or perform a classical merge, then fix the values:

out = df1.merge(df2, how='left')
out.loc[df1['descriptor'].ne('red'), ['val']] = float('nan')

Output:

  date descriptor   product  val
0  jan       blue  product1  NaN
1  feb      green  product2  NaN
2  mar        red  product1    x
3  apr       blue  product2  NaN
4  may        red  product1    y
5  jun        red  product2    z
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading