Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Retain column entries after inner join even if there are no common values

I have 3 dataframes. I merge df1 and df2 through a common column. However, I need to use df3 to find what values are allowed for pairs seen in groupby created. I could get this part done too using 2-column merge through inner join, but I also need to se the entries that did not have any common elements. I tried variously but failed. So far what I could do is represented with a model problem here:

ch = {'country':['India','India','India','USA','USA','Italy','Italy'],'hotel':['Taj','Oberoi','Hilton','Taj','Hilton','Oberoi','Marriott']}
ch_df = pd.DataFrame.from_dict(ch)

hm = {'hotel':['Taj','Taj','Taj','Oberoi','Oberoi','Marriott','Marriott','Marriott','Hilton','Hilton'],'menu':['ildi','dosa','soup','soup','ildi','soup','pasta','pizza','pizza','burger']}
hm_df = pd.DataFrame.from_dict(hm)

cm = {'country':['India','India','India','USA','USA','USA','Italy','Italy'],'menu':['ildi','dosa','soup','dosa','burger','pizza','pizza','pasta']}
cm_df = pd.DataFrame.from_dict(cm)

chm_df = pd.merge(ch_df, hm_df, left_on='hotel', right_on='hotel')

pd.merge(left=chm_df, right=cm_df, on=['country','menu'], how='inner').groupby(['country','hotel'])['menu'].apply(list).reset_index(name='menu items')
  country     hotel          menu items
0   India    Oberoi        [ildi, soup]
1   India       Taj  [ildi, dosa, soup]
2   Italy  Marriott      [pasta, pizza]
3     USA    Hilton     [pizza, burger]
4     USA       Taj              [dosa]

What I need are entries such as:

5   Italy  Oberoi                    []
...

One inefficient way is to add to each pair in hm_df an allowed menu item and remove it after groupby. But it looks ugly. Could you please suggest a more elegant method please?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks in advance.

>Solution :

If need all possible combinations is possible use DataFrame.unstack withDataFrame.stack, for replace non exist values to empty lists use fill_value=[] parameter:

df = pd.merge(chm_df, cm_df, on=['country','menu']).groupby(['country','hotel'])['menu'].apply(list).unstack(fill_value=[]).stack().reset_index(name='menu items')
    
print (df)
   country     hotel          menu items
0    India    Hilton                  []
1    India  Marriott                  []
2    India    Oberoi        [ildi, soup]
3    India       Taj  [ildi, dosa, soup]
4    Italy    Hilton                  []
5    Italy  Marriott      [pasta, pizza]
6    Italy    Oberoi                  []
7    Italy       Taj                  []
8      USA    Hilton     [pizza, burger]
9      USA  Marriott                  []
10     USA    Oberoi                  []
11     USA       Taj              [dosa]

For completness if need only non exist values from chm_df convert to empty lists:

df = pd.merge(chm_df, cm_df, on=['country','menu']).groupby(['country','hotel'])['menu'].apply(list).reindex(pd.MultiIndex.from_frame(ch_df), fill_value=[]).reset_index(name='menu items')
    
print (df)
  country     hotel          menu items
0   India       Taj  [ildi, dosa, soup]
1   India    Oberoi        [ildi, soup]
2   India    Hilton                  []
3     USA       Taj              [dosa]
4     USA    Hilton     [pizza, burger]
5   Italy    Oberoi                  []
6   Italy  Marriott      [pasta, pizza]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading