Follow

Follow

Contact

Home Pandas groupby create a list of values from several columns

Questions

Pandas groupby create a list of values from several columns

byMR

February 14, 2023

I have a dataframe

df = ID  C1  C2  C3  C4
     1   a   b   r    q
     1   b   e   g    h
     2   p   a   z    p
     1   r   a   n    m

I want to get list of values in C1, C2 per ID.
So I will have:

out = ID  l 
      1  [a,b,r,e]
      2  [p,a]

What is the best way to do so?

>Solution :

Use lambda function with numpy.unique – output is sorted:

df1 = (df.groupby('ID')[['C1','C2']].apply(lambda x: np.unique(x).tolist())
        .reset_index(name='l'))
print (df1)
   ID             l
0   1  [a, b, e, r]
1   2        [a, p]

Or with unique – output is with original order:

df1 = (df.groupby('ID')[['C1','C2']].apply(lambda x: pd.unique(np.ravel(x)).tolist())
        .reset_index(name='l'))
print (df1)
   ID             l
0   1  [a, b, e, r]
1   2        [p, a]

Another idea with remove missing values and duplicates:

df1 = (df.melt(id_vars='ID', value_vars=['C1','C2'], value_name='l')
         .drop_duplicates(['ID','l'])
         .dropna(subset=['l'])
         .groupby('ID')['l']
         .agg(list)
         .reset_index())
print (df1)
   ID             l
0   1  [a, b, r, e]
1   2        [p, a]

group-by

byMR

Published February 14, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

More than one occurrence from table

byMR

February 14, 2023

Questions

setTimeout javascript function 'detonator timer with delay'

byMR

February 14, 2023

Questions

Type guard not being identified when assigned to boolean – strict mode

byMR

February 14, 2023

Questions

Plotting dates on the x-axis and customizing labels

byMR

February 14, 2023

Questions

Navigate into 2 pages in the same test

byMR

February 14, 2023

Questions

Writing a DataFrame to an excel file where items in a list are put into separate cells

byMR

February 14, 2023