Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

concatenate dataframe with overlapping keys non overlapping

I have a number of client rating flat files, one per business unit/division and it contains the rating for that specific client in that business unit. Some clients can cross multiple business units so could have one rating in one unit and a different rating in a different file or none at all.

I want a single data structure where I can look up a particular client and get all applicable ratings for that client.

example data

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 = pandas.DataFrame({'client': ['Client_A', 'Client_B', 'Client_C'], 'bank_rating': ['A', 'A', 'B']})
df1.set_index('client', inplace=True)

df2 = pandas.DataFrame({'client': ['Client_A', 'Client_C'], 'insurance_rating': ['B', 'A']})
df2.set_index('client', inplace=True)

df3 = pandas.DataFrame({'client': ['Client_B', 'Client_D'], 'wealth_rating': ['D', 'A']})
df3.set_index('client', inplace=True)

frames = [df1, df2, df3] #in reality frames is a dynamic list built from x number of files in a folder

attempted solution

df = pandas.concat(frames)

          bank_rating  insurance_rating  wealth_rating
client
Client_A  A            NaN               NaN
Client_B  A            NaN               NaN
Client_C  B            NaN               NaN
Client_A  NaN          B                 NaN
Client_C  NaN          A                 NaN
Client_B  NaN          NaN               D
Client_D  NaN          NaN               A

what i want is a unique row per customer with each business unit rating as column

          bank_rating  insurance_rating  wealth_rating
client    
Client_A  A            B                 NaN
Client_B  A            NaN               D
Client_C  B            A                 NaN
Client_D  NaN          NaN               A

>Solution :

You are almost there, it’s just that concat assumes that you want to add new rows to the dataframes (the 0’th axis). So writing

df = pandas.concat(frames, axis=1)

should do the trick for you.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading