Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: How to use list variable in groupby?

I have a pandas dataframe df:

state   name    age
WB      Jim     26
CA      John    32
CA      Jason   14

where I am trying to use groupby state,name and and find max() of age:

df2 = df.groupby(['state', 'name'])['age'].max().reset_index()

The above is working, but when I use a list variable instead of hardcoding column names like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

cols = ['state', 'name']
df2 = df.groupby(cols)['age'].max().reset_index()

I am getting error:

raise TypeError("You have to supply one of 'by' and 'level'")
TypeError: You have to supply one of 'by' and 'level'

How do i solve this?

>Solution :

The error you’re encountering occurs because the groupby function expects either a list of column names or a single column name as the argument, but not a list variable. To resolve this, you can pass the elements of the cols list as separate arguments using the unpacking operator *.

Here’s the modified code that should work:

cols = ['state', 'name']
df2 = df.groupby(*cols)['age'].max().reset_index()

By using *cols in the groupby function, the elements of the cols list will be unpacked and passed as separate arguments to the groupby function. This way, you can pass multiple column names to the groupby function without encountering the error.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading