I would like to group the following Pandas DataFrame by ID column:
|----+----------------------------------------+-----------------|
| ID | Elements | Colors |
|----+----------------------------------------+-----------------|
| A | '1st element, 2d element, 3d element' | 'red, blue' |
| A | '2d element, 4th element' | 'blue, green' |
| B | '3d element, 5th element, 6th element' | 'white, purple' |
| B | '3d element, 5th element, 7th element' | 'white, teal' |
| B | '3d element, 5th element, 8th element' | 'white, black' |
|----+----------------------------------------+-----------------|
In order to obtain the following Pandas DataFrame:
|----+-----------------------------------------------------------------+------------------------------|
| ID | Elements | Colors |
|----+-----------------------------------------------------------------+------------------------------|
| A | '1st element, 2d element, 3d element, 4th element' | 'red, blue, green' |
| B | '3d element, 5h element, 6th element, 7th element, 8th element' | 'white, purple, teal, black' |
|----+-----------------------------------------------------------------+------------------------------|
>Solution :
To remove the duplicates in the "Elements" and "Colors" columns after grouping by the "ID" column, you can use the following code:
def clean_column(col):
elements = ', '.join([str(x) for x in col if str(x) != 'nan'])
unique_elements = list(set(elements.split(', ')))
sorted_elements = sorted(unique_elements)
return ', '.join(sorted_elements)
grouped_df = df.groupby('ID').agg({'Elements': clean_column, 'Colors': clean_column})
grouped_df