How to get values in a Pandas DataFrame using .loc with a MultiIindex?

May 9, 2023

First time poster here. I’m having some trouble understanding how .loc works for a DataFrame that has a MultiIndex. More specifically, I’m interested in a case where I have a 2 level MultiIndex and I want to select values by label of the second level.

For example, using the DataFrame defined in the documentation of loc:

tuples = [
   ('cobra', 'mark i'), ('cobra', 'mark ii'),
   ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
   ('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
        [1, 4], [7, 1], [16, 36]]
df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
df

--------------------------------------
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

I want to select the values corresponding to mark ii, regardless of the first level label. I can use xs:

df.xs("mark ii", level=1)

-----------------------------
            max_speed  shield
cobra               0       4
sidewinder          1       4
viper               7       1

And I thought I could use loc:

df.loc[:, "mark ii"]

-------------------
KeyError: 'mark ii'

However, I get the expected result if I do the following:

df.loc[:, "mark ii", :]    # <-- note the added semicolon after 'mark ii'

-----------------------------
            max_speed  shield
cobra               0       4
sidewinder          1       4
viper               7       1

Can someone explain why df.loc[:, "mark ii"] doesn’t work here?

>Solution :

You can’t use directly : to specify level. df.loc[:, "mark ii"] mean: select all rows (:), and the column "mark ii" (which doesn’t exist).

The correct approach should be to use IndexSlice:

df.loc[pd.IndexSlice[:, "mark ii"], :]

# or slice
df.loc[(slice(None), 'mark ii'), :]

You can also specify the axis in loc:

df.loc(axis=0)[:, "mark ii"]

Output:

                    max_speed  shield
cobra      mark ii          0       4
sidewinder mark ii          1       4
viper      mark ii          7       1