Given a grouped DataFrame (obtained by df.groupby([col1, col2])) I would like to obtain the grouping variables (col1 and col2 in this case).
For example, from the GroupBy user guide
import pandas as pd
import numpy as np
df = pd.DataFrame(
[
("bird", "Falconiformes", 389.0),
("bird", "Psittaciformes", 24.0),
("mammal", "Carnivora", 80.2),
("mammal", "Primates", np.nan),
("mammal", "Carnivora", 58),
],
index=["falcon", "parrot", "lion", "monkey", "leopard"],
columns=("class", "order", "max_speed"),
)
grouped = df.groupby(["class", "order"])
Given grouped I would like to get class and order. However, grouped.indices and grouped.groups contain only the values of the keys, not the column names.
The column names must be in the object somewhere, because if I run grouped.size() for example, they are included in the indices:
class order
bird Falconiformes 1
Psittaciformes 1
mammal Carnivora 2
Primates 1
dtype: int64
And therefore I can run grouped.size().index.names which returns FrozenList(['class', 'order']). But this is doing an unnecessary calculation of .size(). Is there a nicer way of retrieving these from the object?
The ultimate reason I’d like this is so that I can do some processing for a particular group, and associate it with a key-value pair which defines the group. That way I would be able to amalgamate different grouped datasets with arbitrary levels of grouping. For example I could have
group max_speed
class=bird,order=Falconiformes 389.0
class=bird,order=Psittaciformes 24.0
class=bird 206.5
foo=bar 45.5
...
>Solution :
Very similar to your own suggestion, you can extract the grouped by column names using:
grouped.dtypes.index.names
It is not shorter, but you avoid calling a method.