Pandas: How to use list variable in groupby?

June 16, 2023

I have a pandas dataframe df:

state   name    age
WB      Jim     26
CA      John    32
CA      Jason   14

where I am trying to use groupby state,name and and find max() of age:

df2 = df.groupby(['state', 'name'])['age'].max().reset_index()

The above is working, but when I use a list variable instead of hardcoding column names like:

cols = ['state', 'name']
df2 = df.groupby(cols)['age'].max().reset_index()

I am getting error:

raise TypeError("You have to supply one of 'by' and 'level'")
TypeError: You have to supply one of 'by' and 'level'

How do i solve this?

>Solution :

The error you’re encountering occurs because the groupby function expects either a list of column names or a single column name as the argument, but not a list variable. To resolve this, you can pass the elements of the cols list as separate arguments using the unpacking operator *.

Here’s the modified code that should work:

cols = ['state', 'name']
df2 = df.groupby(*cols)['age'].max().reset_index()

By using *cols in the groupby function, the elements of the cols list will be unpacked and passed as separate arguments to the groupby function. This way, you can pass multiple column names to the groupby function without encountering the error.