I have the dataframe below:
ID COL1 COL2
0 id001 val1 xxxxx
1 id001 val1 yyyyy
2 id002 val2 yyyyy
3 id003 val3 zzzzz
4 id003 val4 zzzzz
And this is the expected output:
ID COL1 COL2
0 id001 val1 xxxxx|yyyyy
1 id002 val2 yyyyy
2 id003 val3|val4 zzzzz
I made the code below but unfortunately val1 (first row) and zzzzz (last row) are repeated.
df_gr = df[['COL1', 'COL2']].astype(str).groupby(df['ID']).agg('|'.join).reset_index()
ID COL1 COL2
0 id001 val1|val1 xxxxx|yyyyy
1 id002 val2 yyyyy
2 id003 val3|val4 zzzzz|zzzzz
Do you know how to fix this, please ?
>Solution :
Note that I needed to convert an array to a list with string elements before applying join:
df.groupby(['ID']).agg({'COL1': 'unique', 'COL2':'unique'}).applymap(lambda x: '|'.join(map(str, x))).reset_index()