Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split pandas column by separator and merge to a column of another dataframe

I have following dataframes. One of them have a column with comma separated values (df1: col2) which I want to split and join on a column of another dataframe (df2: col4).

df1

col1   col2
a      abc, df
b      ert
c      xyz, ghi

df2
col3   col4
id1    abc
id2    erg
id3    ghi

In the end, I would like to get this output

col1    col2         col3     col4
a       abc, df      id1      abc      
c       xyz, ghi     id3      ghi
df1 = (df1.assign(col2 = df1['col2'].str.split(','))
          .explode('col2')
          .merge(df1, on=['col4'], how='left')
          .groupby(['col1'], as_index=False, sort=False)
          .agg({'col1':','.join}))

I am getting this error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

KeyError: 'col4'

However this column exist in the dataframe. Any help will be highly appreciated

>Solution :

You can create helper column col4 for exploding values, use for merge with default inner join and aggregate all columns:

out = (df1.assign(col4 = df1['col2'].str.split(',\s*'))
          .explode('col4')
          .merge(df2, on='col4') 
          .groupby(['col1'], as_index=False, sort=False)
          .agg(', '.join)
          )
print (out)
  col1      col2 col4 col3
0    a   abc, df  abc  id1
1    c  xyz, ghi  ghi  id3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading