How do I get the column names for the keys in a DataFrameGroupBy object?

March 9, 2022

Given a grouped DataFrame (obtained by df.groupby([col1, col2])) I would like to obtain the grouping variables (col1 and col2 in this case).

For example, from the GroupBy user guide

import pandas as pd
import numpy as np
df = pd.DataFrame(
    [
        ("bird", "Falconiformes", 389.0),
        ("bird", "Psittaciformes", 24.0),
        ("mammal", "Carnivora", 80.2),
        ("mammal", "Primates", np.nan),
        ("mammal", "Carnivora", 58),
    ],
    index=["falcon", "parrot", "lion", "monkey", "leopard"],
    columns=("class", "order", "max_speed"),
)
grouped = df.groupby(["class", "order"])

Given grouped I would like to get class and order. However, grouped.indices and grouped.groups contain only the values of the keys, not the column names.

The column names must be in the object somewhere, because if I run grouped.size() for example, they are included in the indices:

class   order         
bird    Falconiformes     1
        Psittaciformes    1
mammal  Carnivora         2
        Primates          1
dtype: int64

And therefore I can run grouped.size().index.names which returns FrozenList(['class', 'order']). But this is doing an unnecessary calculation of .size(). Is there a nicer way of retrieving these from the object?

The ultimate reason I’d like this is so that I can do some processing for a particular group, and associate it with a key-value pair which defines the group. That way I would be able to amalgamate different grouped datasets with arbitrary levels of grouping. For example I could have

group                            max_speed
class=bird,order=Falconiformes       389.0
class=bird,order=Psittaciformes       24.0
class=bird                           206.5
foo=bar                               45.5
...