First time poster here. I’m having some trouble understanding how .loc works for a DataFrame that has a MultiIndex. More specifically, I’m interested in a case where I have a 2 level MultiIndex and I want to select values by label of the second level.
For example, using the DataFrame defined in the documentation of loc:
tuples = [
('cobra', 'mark i'), ('cobra', 'mark ii'),
('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
[1, 4], [7, 1], [16, 36]]
df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
df
--------------------------------------
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
I want to select the values corresponding to mark ii, regardless of the first level label. I can use xs:
df.xs("mark ii", level=1)
-----------------------------
max_speed shield
cobra 0 4
sidewinder 1 4
viper 7 1
And I thought I could use loc:
df.loc[:, "mark ii"]
-------------------
KeyError: 'mark ii'
However, I get the expected result if I do the following:
df.loc[:, "mark ii", :] # <-- note the added semicolon after 'mark ii'
-----------------------------
max_speed shield
cobra 0 4
sidewinder 1 4
viper 7 1
Can someone explain why df.loc[:, "mark ii"] doesn’t work here?
>Solution :
You can’t use directly : to specify level. df.loc[:, "mark ii"] mean: select all rows (:), and the column "mark ii" (which doesn’t exist).
The correct approach should be to use IndexSlice:
df.loc[pd.IndexSlice[:, "mark ii"], :]
# or slice
df.loc[(slice(None), 'mark ii'), :]
You can also specify the axis in loc:
df.loc(axis=0)[:, "mark ii"]
Output:
max_speed shield
cobra mark ii 0 4
sidewinder mark ii 1 4
viper mark ii 7 1