I have following dataframes. One of them have a column with comma separated values (df1: col2) which I want to split and join on a column of another dataframe (df2: col4).
df1
col1 col2
a abc, df
b ert
c xyz, ghi
df2
col3 col4
id1 abc
id2 erg
id3 ghi
In the end, I would like to get this output
col1 col2 col3 col4
a abc, df id1 abc
c xyz, ghi id3 ghi
df1 = (df1.assign(col2 = df1['col2'].str.split(','))
.explode('col2')
.merge(df1, on=['col4'], how='left')
.groupby(['col1'], as_index=False, sort=False)
.agg({'col1':','.join}))
I am getting this error:
KeyError: 'col4'
However this column exist in the dataframe. Any help will be highly appreciated
>Solution :
You can create helper column col4 for exploding values, use for merge with default inner join and aggregate all columns:
out = (df1.assign(col4 = df1['col2'].str.split(',\s*'))
.explode('col4')
.merge(df2, on='col4')
.groupby(['col1'], as_index=False, sort=False)
.agg(', '.join)
)
print (out)
col1 col2 col4 col3
0 a abc, df abc id1
1 c xyz, ghi ghi id3