Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas groupby with length of lists

I need display in dataframe columns both the user_id and length of content_id which is a list object. But struggling to do using groupby.
Please help in both groupby as well as my question asked at the bottom of this post (how do I get the results along with user_id in dataframe?)

Dataframe types:

df.dtypes

output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

user_id       object
content_id    object
dtype: object

Sample Data:

    user_id     content_id
0   user_18085  [cont_2598_4_4, cont_2738_2_49, cont_4482_2_19...
1   user_16044  [cont_2738_2_49, cont_4482_2_19, cont_4994_18_...
2   user_13110  [cont_2598_4_4, cont_2738_2_49, cont_4482_2_19...
3   user_18909  [cont_3170_2_28]
4   user_15509  [cont_2598_4_4, cont_2738_2_49, cont_4482_2_19...

Pandas query:

df.groupby('user_id')['content_id'].count().reset_index()

df.groupby(['user_id'])['content_id'].apply(lambda x: get_count(x))

output:

    user_id     content_id
0   user_10013  1
1   user_10034  1
2   user_10042  1

When I tried without grouping, I am getting fine as below –

df['content_id'].apply(lambda x: len(x))


0       11
1        9
2       11
3        1

But, how do I get the results along with user_id in dataframe? Like I want in below format –

user_id   content_id
some xxx  11
some yyy  6
  

>Solution :

pandas.Groupby returns a grouper element not the contents of each cell. As such it is not possible (without alot of workarounding) to do what you want. Instead you need to simply rewrite the columns (as suggested by @ifly6)

Using

df_agg = df.copy()
df_agg.content_id = df_agg.content_id.apply(len)
df_agg = df_agg.groupby('user_id').sum()

will result in the same dataframe as the Groupby you described.

For completeness sake the instruction for a single groupby would be

df.groupby('user_id').agg(lambda x: x.apply(len).sum())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading