Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge should adopt Nan values if not existing value

However, I have the following problem:

  • If a year or date does not exist in df2 then a price and a listing_id is automatically added during the merge. But that should be NaN

  • The second problem is when merging, as soon as I have multiple data that were on the same day and year then the temperature is also merged to the second, for example:

    MEDevel.com: Open-source for Healthcare and Education

    Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

    Visit Medevel

d = {'id': [1], 'day': [1], 'temperature': [20], 'year': [2001]}
df = pd.DataFrame(data=d)
print(df)

   id  day  temperature  year
0   1    1           20  2001

d2 = {'id': [122, 244], 'day': [1, 1],
     'listing_id': [2, 4], 'price': [20, 440], 'year': [2001, 2001]}
df2 = pd.DataFrame(data=d2)
print(df2)

    id  day  listing_id  price  year
0  122    1           2     20  2001
1  244    1           4    440  2001

df3 = pd.merge(df,df2[['day', 'listing_id', 'price']],
              left_on='day', right_on = 'day',how='left')
print(df3)

   id  day  temperature  year  listing_id  price
0   1    1           20  2001           2     20
1   1    1           20  2001           4    440 # <-- The second temperature is wrong :/

This should not be so, because if I later still have a date from year 2002 which was in day 1 with a temperature of 30 and I want to calculate the average. Then I get the following formula: 20 + 20 + 30 = 23.3. The formula should be 20 + 30 = 25. Therefore, if a value has already been filled, there should be a NaN value in it.

Code Snippet

d = {'id': [1, 2, 3, 4, 5], 'day': [1, 2, 3, 4, 2], 
     'temperature': [20, 40, 50, 60, 20], 'year': [2001, 2002, 2004, 2005, 1999]}
df = pd.DataFrame(data=d)
print(df)

   id  day  temperature  year
0   1    1           20  2001
1   2    2           40  2002
2   3    3           50  2004
3   4    4           60  2005
4   5    2           20  1999


d2 = {'id': [122, 244, 387, 4454, 521], 'day': [1, 2, 3, 4, 2],
     'listing_id': [2, 4, 5, 6, 7], 'price': [20, 440, 500, 6600, 500], 
      'year': [2001, 2002, 2004, 2005, 2005]}
df2 = pd.DataFrame(data=d2)
print(df2)

     id  day  listing_id  price  year
0   122    1           2     20  2001
1   244    2           4    440  2002
2   387    3           5    500  2004
3  4454    4           6   6600  2005
4   521    2           7    500  2005


df3 = pd.merge(df,df2[['day','listing_id', 'price']],
              left_on='day', right_on = 'day',how='left').drop('day',axis=1)
print(df3)

   id  day  temperature  year  listing_id  price
0   1    1           20  2001           2     20
1   2    2           40  2002           4    440
2   2    2           40  2002           7    500
3   3    3           50  2004           5    500
4   4    4           60  2005           6   6600
5   5    2           20  1999           4    440
6   5    2           20  1999           7    500


What I want

   id  day  temperature  year  listing_id  price
0   1    1           20  2001           2     20
1   2    2           40  2002           4    440
2   2    2          NaN  2005           7    500
3   3    3           50  2004           5    500
4   4    4           60  2005           6   6600
5   5    2           20  1999         NaN    NaN

>Solution :

IIUC:

>>> df1.merge(df2[['day', 'listing_id', 'price', 'year']],
              on=['day', 'year'], how='outer')

    id  day  temperature  year  listing_id   price
0  1.0    1         20.0  2001         2.0    20.0
1  2.0    2         40.0  2002         4.0   440.0
2  3.0    3         50.0  2004         5.0   500.0
3  4.0    4         60.0  2005         6.0  6600.0
4  5.0    2         20.0  1999         NaN     NaN
5  NaN    2          NaN  2005         7.0   500.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading