I have a pandas dataframe df:
state name age
WB Jim 26
CA John 32
CA Jason 14
where I am trying to use groupby state,name and and find max() of age:
df2 = df.groupby(['state', 'name'])['age'].max().reset_index()
The above is working, but when I use a list variable instead of hardcoding column names like:
cols = ['state', 'name']
df2 = df.groupby(cols)['age'].max().reset_index()
I am getting error:
raise TypeError("You have to supply one of 'by' and 'level'")
TypeError: You have to supply one of 'by' and 'level'
How do i solve this?
>Solution :
The error you’re encountering occurs because the groupby function expects either a list of column names or a single column name as the argument, but not a list variable. To resolve this, you can pass the elements of the cols list as separate arguments using the unpacking operator *.
Here’s the modified code that should work:
cols = ['state', 'name']
df2 = df.groupby(*cols)['age'].max().reset_index()
By using *cols in the groupby function, the elements of the cols list will be unpacked and passed as separate arguments to the groupby function. This way, you can pass multiple column names to the groupby function without encountering the error.