Pandas groupby create a list of values from several columns

I have a dataframe

df = ID  C1  C2  C3  C4
     1   a   b   r    q
     1   b   e   g    h
     2   p   a   z    p
     1   r   a   n    m

I want to get list of values in C1, C2 per ID.
So I will have:

out = ID  l 
      1  [a,b,r,e]
      2  [p,a]

What is the best way to do so?

>Solution :

Use lambda function with numpy.unique – output is sorted:

df1 = (df.groupby('ID')[['C1','C2']].apply(lambda x: np.unique(x).tolist())
        .reset_index(name='l'))
print (df1)
   ID             l
0   1  [a, b, e, r]
1   2        [a, p]

Or with unique – output is with original order:

df1 = (df.groupby('ID')[['C1','C2']].apply(lambda x: pd.unique(np.ravel(x)).tolist())
        .reset_index(name='l'))
print (df1)
   ID             l
0   1  [a, b, e, r]
1   2        [p, a]

Another idea with remove missing values and duplicates:

df1 = (df.melt(id_vars='ID', value_vars=['C1','C2'], value_name='l')
         .drop_duplicates(['ID','l'])
         .dropna(subset=['l'])
         .groupby('ID')['l']
         .agg(list)
         .reset_index())
print (df1)
   ID             l
0   1  [a, b, r, e]
1   2        [p, a]

Leave a Reply