Follow

Follow

Contact

Home How to aggregate only non duplicates values using Pandas

Questions

How to aggregate only non duplicates values using Pandas

byMR

August 29, 2022

I have the dataframe below:

    ID  COL1    COL2
0   id001   val1    xxxxx
1   id001   val1    yyyyy
2   id002   val2    yyyyy
3   id003   val3    zzzzz
4   id003   val4    zzzzz

And this is the expected output:

ID  COL1    COL2
0   id001   val1    xxxxx|yyyyy
1   id002   val2    yyyyy
2   id003   val3|val4   zzzzz

I made the code below but unfortunately val1 (first row) and zzzzz (last row) are repeated.

df_gr = df[['COL1', 'COL2']].astype(str).groupby(df['ID']).agg('|'.join).reset_index()

    ID  COL1    COL2
0   id001   val1|val1   xxxxx|yyyyy
1   id002   val2    yyyyy
2   id003   val3|val4   zzzzz|zzzzz

Do you know how to fix this, please ?

>Solution :

Note that I needed to convert an array to a list with string elements before applying join:

df.groupby(['ID']).agg({'COL1': 'unique', 'COL2':'unique'}).applymap(lambda x: '|'.join(map(str, x))).reset_index()

pandas

byMR

Published August 29, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Lombok @Builder Model Class with field class and List Class

byMR

August 29, 2022

Questions

Convert value counts of multiple columns to pandas dataframe

byMR

August 29, 2022

Questions

Can't print with colors by using environment variables

byMR

August 29, 2022

Questions

CORS policy: No 'Access-Control-Allow-Origin' – AWS and Vercel

byMR

August 29, 2022

Questions

How to install .exe with enabling "I agree to the license terms and conditions" in windows cmd

byMR

August 29, 2022

Questions

Issue when parsing string date from CSV in Jupyter notebook

byMR

August 29, 2022