Home How to add indicator column that lists all values of another column in pandas

Questions

How to add indicator column that lists all values of another column in pandas

January 31, 2023

I have the following pandas dataframe:

import pandas as pd
pd.DataFrame({'id': [1,1,1,1,2,2,2], 'col': ['a','b','c','c','a','b','d']})

id  col
0   1   a
1   1   b
2   1   c
3   1   c
4   2   a
5   2   b
6   2   d

I would like to add a new column, which would contain the list of unique values of col by id

The end dataframe would look like this:

pd.DataFrame({'id': [1,1,1,1,2,2,2], 'col': ['a','b','c','c','a','b','d'],
             'col2': [['a','b','c'],['a','b','c'],['a','b','c'],['a','b','c'],
                     ['a','b','d'],['a','b','d'],['a','b','d']]})

id  col col2
0   1   a   [a, b, c]
1   1   b   [a, b, c]
2   1   c   [a, b, c]
3   1   c   [a, b, c]
4   2   a   [a, b, d]
5   2   b   [a, b, d]
6   2   d   [a, b, d]

How could I do that ?

>Solution :

For lists remove duplicated values in order is used dict.fromkeys trick in lambda function and for new column is used Series.map:

df['col2'] = df['id'].map(df.groupby('id')['col'].agg(lambda x: list(dict.fromkeys(x))))

Or remove duplicates first by DataFrame.drop_duplicates and aggregate list:

df['col2'] = df['id'].map(df.drop_duplicates(['id','col']).groupby('id')['col'].agg(list))

If possible use ndarrays instead lists use DataFrameGroupBy.nunique:

df['col2'] = df['id'].map(df.groupby('id')['col'].unique())
print (df)
   id col       col2
0   1   a  [a, b, c]
1   1   b  [a, b, c]
2   1   c  [a, b, c]
3   1   c  [a, b, c]
4   2   a  [a, b, d]
5   2   b  [a, b, d]
6   2   d  [a, b, d]