Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

DataFrame groupby on each item within a column of lists

I have a dataframe (df):

| A   | B     | C                       |
| --- | ----- | ----------------------- |
| CA  | Jon   | [sales, engineering]    |
| NY  | Sarah | [engineering, IT]       |
| VA  | Vox   | [services, engineering] |

I am trying to group by each item in the C column list (sales, engineering, IT, etc.).

Tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby('C')

but got list not hashable, which is expected. I came across another post where it was recommended to convert the C column to tuple which is hashable, but I need to groupby each item and not the combination.

My goal is to get the count of each row in the df for each item in the C column list. So:

sales: 1
engineering: 3
IT: 1
services: 1

While there is probably a simpler way to obtain this than using groupby, I am still curious if groupby can be used in this case.

>Solution :

You can explode & value_counts :

out = df.explode("C").value_counts("C")


Output :

print(out)

C          
engineering    3
IT             1
sales          1
services       1
dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading