Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill merged columns with 0 instead of NaN

I have a problem. I want to merge two dataframes, but instead of NaN it should be filled with 0. But only the "new" columns. How could I do that?

What I tried

df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left', fill = 0)
[OUT]
TypeError: merge() got an unexpected keyword argument 'fill'
d = {'host_id': [1, 1, 2],
     'id': [10, 11, 20],
     'value': ["Hot Water,Cold Water,Kitchen,Coffee", 
               "Hot Water,Coffee,Something",
               "Hot Water,Coffee"]}
df = pd.DataFrame(data=d)
print(df)


d2 = {'host_id': [1, 1, 2, 3],
     'id': [10, 11, 20, 30],
     'some': ['test1', "test2", "test3", np.nan]}
df2 = pd.DataFrame(data=d2)
print(df2)

df_path = df.copy()
df_path.index = pd.MultiIndex.from_arrays(df_path[['host_id', 'id']].values.T, names=['host_id', 'id'])
df_path = df_path['value'].str.split(',', expand=True)
df_path = df_path.melt(ignore_index=False).dropna()
df_path.reset_index(inplace=True)

one_hot = pd.get_dummies(df_path['value'])
df_one = df_path.drop('value',axis = 1)
df_one = df_path.join(one_hot)

grouped_df_one = df_one.groupby(['id']).max()
grouped_df_one = grouped_df_one.drop(columns=['value', 'variable']).reset_index()

df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left')
df3
   host_id  id                                value
0        1  10  Hot Water,Cold Water,Kitchen,Coffee
1        1  11           Hot Water,Coffee,Something
2        2  20                     Hot Water,Coffee

   host_id  id   some
0        1  10  test1
1        1  11  test2
2        2  20  test3
3        3  30    NaN

What I got

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   host_id  id   some  Coffee  Cold Water  Hot Water  Kitchen  Something
0        1  10  test1     1.0         1.0        1.0      1.0        0.0
1        1  11  test2     1.0         0.0        1.0      0.0        1.0
2        2  20  test3     1.0         0.0        1.0      0.0        0.0
3        3  30    NaN     NaN         NaN        NaN      NaN        NaN

What I want

   host_id  id   some  Coffee  Cold Water  Hot Water  Kitchen  Something
0        1  10  test1     1.0         1.0        1.0      1.0        0.0
1        1  11  test2     1.0         0.0        1.0      0.0        1.0
2        2  20  test3     1.0         0.0        1.0      0.0        0.0
3        3  30    NaN       0           0          0        0          0

>Solution :

You can fill specific columns using

df[list_cols] = df[list_cols].fillna(0)

where list_cols is e.g.

list_cols = ["Coffee", "Cold Water", "Hot Water", "Kitchen", "Something"]

See: Pandas fill multiple columns with 0 when null

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading