Groupby lists in Pandas

January 6, 2022

I have a dataframe:

df = pd.DataFrame({'col0':[[0,1],[1,0,0],[1,0],[1,0],[2,0]],
                   'col1':[5,4,3,2,1]})

ie:

        col0  col1
0     [0, 1]     5
1  [1, 0, 0]     4
2     [1, 0]     3
3     [1, 0]     2
4     [2, 0]     1

I would like to group by values in col0, and sum col1 values in the same group. I do:

df.groupby('col0').col1.sum()

but this gives TypeError: unhashable type: 'list'. I do then:

df.groupby(df.col0.apply(frozenset)).col1.sum()

which gives:

col0
(0, 1)    14
(0, 2)     1
Name: col1, dtype: int64

Ie lists were converted into sets (frozensets to be exact), and then groupbyed. The number of elements and order of them did not matter (ie [1,0] and [0,1] belongs to the same group, so does [1,0] and [1,0,0])

If order and number of elements also matter, how do I groupby then?

Desired output of groupbying col0 and summing col1 of above dataframe:

col0
[0, 1]     5
[1,0,0]    4
[1, 0]     5
[2,0]      1
Name: col1, dtype: int64

>Solution :

tuple is immutable, can contain duplicates and maintains the order.

df['col0'] = df['col0'].apply(tuple)
df.groupby('col0', sort=False).sum() # sort=False for original order of col0 
#            col1
# col0           
# (0, 1)        5
# (1, 0, 0)     4
# (1, 0)        5
# (2, 0)        1