Home Combine duplicated rows by unique values in column to fill blank columns in another value

Questions

Combine duplicated rows by unique values in column to fill blank columns in another value

April 18, 2023

Imagine I have the following data (slice as example):

ID      Salary Component 1    Value1      Salary Component 2       Value2
10000   Basic Salary          22000     
10000                                     Housing Allowance        13200

How can I combine rows per ID in a way that I have only one row per ID and filling blank information of columns using other rows column values when filled? It would result in this data:

ID      Salary Component 1    Value1      Salary Component 2       Value2
10000   Basic Salary          22000       Housing Allowance        13200

Thank you for the help!

>Solution :

If need first non missing value per groups is possible use:

df = df.replace('', np.nan).groupby('ID', as_index=False).first()

If possible multiple values and need aggreagte e.g. by types:

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())

df = df.replace('', np.nan).groupby('ID', as_index=False).agg(f)