Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Conditionally finding max of row according to column value while maintaining index order

I’m trying to find, hopefully, a one lines to accomplish the following:

I have the following dataframe:

import pandas as pd
import numpy as np

SIZE = 10
df = pd.DataFrame({'col1': np.random.randint(100, size=SIZE),
                   'col2': np.random.randint(100, size=SIZE),
                   'col3': np.random.randint(100, size=SIZE),
                   'col4': np.random.randint(2, size=SIZE)})
print(df)

outputting

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   col1  col2  col3  col4
0    55    96    40     0
1    82    59    34     1
2    85    66    25     1
3    90    69    27     0
4    36    32    79     1
5    33    69    80     1
6    11    53    88     0
7    31    51    96     0
8    89    76    88     1
9     4    76    47     0

I’m currently ignoring col4 and calculating the max value of each row as follows:

df[['col1', 'col2', 'col3']].max(axis=1)

resulting in

0    96
1    82
2    85
3    90
4    79
5    80
6    88
7    96
8    89
9    76
dtype: int64

I want to use col4 to conditionally calculate the max value. If col4 value is 0, calculate max value of col1, else calculate max value of ['col2', 'col3']. I also want to keep the same index/order of the dataframe.

The end result would be

0    55    # col1
1    59    # max(col2, col3)
2    66    # max(col2, col3)
3    90    # col1
4    79    # max(col2, col3)
5    80    # max(col2, col3)
6    11    # col1
7    31    # col1
8    88    # max(col2, col3)
9    4     # col1
dtype: int64

One possibility would be to create two new dataframes, calculate the max, and join them again, but this would possibly mess the index (I guess I could save that too). Any better ideas?

Apologies if this question was already asked, but I couldn’t find with the search terms

>Solution :

There might be a better option… but this does the job by simply applying your rule as a lambda row-wise:

df.apply(lambda x: x[["col2", "col3"]].max() if x["col4"] else x["col1"], axis=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading