Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas idxmax() returns 0 if no value matches condition?

I’m trying to understand the behaviour of idxmax().

I’m using idxmax() to get all rows below the first row meeting a condition in a dataframe, like this:

df = df[df['A'].gt(0).idxmax():]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’m then checking if the resulting dataframe is empty. There’s one unit test where I expect an empty dataframe (no row meets the condition), but it was never empty, so I looked into it.

I found that if the condition is NEVER met, idxmax() returns 0 (instead of, say, None or instead of throwing an exception I could catch) – which clashes with the case where the condition IS MET at row 0.

Here’s an example of what I’m seeing:

import pandas as pd

df = pd.DataFrame(data={'A':[0, 0, 0, 0]})  # no element where gt(0) is True
print("Truths values\n", df['A'].gt(0))     # checking the truth values of the Series
print("Index of first row where 'A' is at 0: ", df['A'].gt(0).idxmax())

The dataframe:

First dataframe

The execution result:

>>> Truth values
0    False
1    False
2    False
3    False

Index of first row where 'A' is at 0: 0   <--- ???

And with a different dataframe:

df2 = pd.DataFrame(data={'A':[1, 0, 0, 0]})
print("Truths values\n", df['A'].gt(0))
print("Index of first row where 'A' is at 0", df['A'].gt(0).idxmax())

The dataframe:

enter image description here

The execution result:

Truth values
0    True
1    False
2    False
3    False

Index of first row where 'A' is at 0: 0

So we end up with the same behaviour for two different inputs.

My current solution: summing over ‘A’ and checking if the sum is 0, and doing something different if that’s the case – which seems a bit overkill.

Am I using idxmax() wrong ? Could someone shed some light on this, as the behaviour seems very counter-intuitive ?

Thanks 🙂

>Solution :

Series.idxmax will return the first row label of maximum value if multiple values equal the maximum.

enter image description here

Therefore, in following dataframe, index 0 will be returned for Series.idxmax since all False equals.

0    False
1    False
2    False
3    False

In following dataframe, index 0 will be returned for Series.idxmax since True is the maxiumn value. (In Python, True is large than False, you can print the result of True > False)

0    True
1    False
2    False
3    False

In your df = df[df['A'].gt(0).idxmax():], you are actually selecting columns with index slice. If you want to select rows, you need use df.loc[df['A'].gt(0).idxmax():]

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading