Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get values in a Pandas DataFrame using .loc with a MultiIindex?

First time poster here. I’m having some trouble understanding how .loc works for a DataFrame that has a MultiIndex. More specifically, I’m interested in a case where I have a 2 level MultiIndex and I want to select values by label of the second level.

For example, using the DataFrame defined in the documentation of loc:

tuples = [
   ('cobra', 'mark i'), ('cobra', 'mark ii'),
   ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
   ('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
        [1, 4], [7, 1], [16, 36]]
df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
df

--------------------------------------
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

I want to select the values corresponding to mark ii, regardless of the first level label. I can use xs:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.xs("mark ii", level=1)

-----------------------------
            max_speed  shield
cobra               0       4
sidewinder          1       4
viper               7       1

And I thought I could use loc:

df.loc[:, "mark ii"]

-------------------
KeyError: 'mark ii'

However, I get the expected result if I do the following:

df.loc[:, "mark ii", :]    # <-- note the added semicolon after 'mark ii'

-----------------------------
            max_speed  shield
cobra               0       4
sidewinder          1       4
viper               7       1

Can someone explain why df.loc[:, "mark ii"] doesn’t work here?

>Solution :

You can’t use directly : to specify level. df.loc[:, "mark ii"] mean: select all rows (:), and the column "mark ii" (which doesn’t exist).

The correct approach should be to use IndexSlice:

df.loc[pd.IndexSlice[:, "mark ii"], :]

# or slice
df.loc[(slice(None), 'mark ii'), :]

You can also specify the axis in loc:

df.loc(axis=0)[:, "mark ii"]

Output:

                    max_speed  shield
cobra      mark ii          0       4
sidewinder mark ii          1       4
viper      mark ii          7       1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading