Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Groupby lists in Pandas

I have a dataframe:

df = pd.DataFrame({'col0':[[0,1],[1,0,0],[1,0],[1,0],[2,0]],
                   'col1':[5,4,3,2,1]})

ie:

        col0  col1
0     [0, 1]     5
1  [1, 0, 0]     4
2     [1, 0]     3
3     [1, 0]     2
4     [2, 0]     1

I would like to group by values in col0, and sum col1 values in the same group. I do:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby('col0').col1.sum()

but this gives TypeError: unhashable type: 'list'. I do then:

df.groupby(df.col0.apply(frozenset)).col1.sum()

which gives:

col0
(0, 1)    14
(0, 2)     1
Name: col1, dtype: int64

Ie lists were converted into sets (frozensets to be exact), and then groupbyed. The number of elements and order of them did not matter (ie [1,0] and [0,1] belongs to the same group, so does [1,0] and [1,0,0])

If order and number of elements also matter, how do I groupby then?

Desired output of groupbying col0 and summing col1 of above dataframe:

col0
[0, 1]     5
[1,0,0]    4
[1, 0]     5
[2,0]      1
Name: col1, dtype: int64

>Solution :

tuple is immutable, can contain duplicates and maintains the order.

df['col0'] = df['col0'].apply(tuple)
df.groupby('col0', sort=False).sum() # sort=False for original order of col0 
#            col1
# col0           
# (0, 1)        5
# (1, 0, 0)     4
# (1, 0)        5
# (2, 0)        1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading