Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine duplicated rows by unique values in column to fill blank columns in another value

Imagine I have the following data (slice as example):

ID      Salary Component 1    Value1      Salary Component 2       Value2
10000   Basic Salary          22000     
10000                                     Housing Allowance        13200

How can I combine rows per ID in a way that I have only one row per ID and filling blank information of columns using other rows column values when filled? It would result in this data:

ID      Salary Component 1    Value1      Salary Component 2       Value2
10000   Basic Salary          22000       Housing Allowance        13200

Thank you for the help!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If need first non missing value per groups is possible use:

df = df.replace('', np.nan).groupby('ID', as_index=False).first()

If possible multiple values and need aggreagte e.g. by types:

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())

df = df.replace('', np.nan).groupby('ID', as_index=False).agg(f)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading