Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get the value of a column by max value in others

I have the df:

import pandas as pd
data = {"id_1":["a","b","c"],
        "id_2":["q","w","e"],
        "val_1":[1,2,3],
        "val_2":[2,0,0]}
df = pd.DataFrame(data)

#   id_1 id_2  val_1  val_2
# 0    a    q      1      2
# 1    b    w      2      0
# 2    c    e      3      0

I want to compare the values of val_1 and val_2 columns.
If val_1 is more than val_2 (they are never the same) I want to populate a new column with the value of id_1, else the value of id_2. To create:

  id_1 id_2  val_1  val_2 max_id
0    a    q      1      2      q
1    b    w      2      0      b
2    c    e      3      0      c

max_id of the first row is q because val_2>val_1

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I can get the max value with:
df[["val_1", "val_2"]].max(axis=1)

And the column name from which the fetch the ids with:

index_map = {0:"id_1", 1:"id_2"}
df.apply(lambda x: [x.val_1, x.val_2].index(max([x.val_1, x.val_2])), axis=1).map(index_map)
# 0    id_2
# 1    id_1
# 2    id_1

But then my ideas run out.

>Solution :

Since you have two columns, a simple approach would be to use where:

df['max_id'] = df['id_1'].where(df['val_1'].gt(df['val_2']), df['id_2'])

If you want to generalize to an arbitrary number of id/values pairs, you could combine idxmax and indexing lookup:

idx, cols = pd.factorize(df.filter(like='val_').idxmax(axis=1)
                           .str.replace('val_', 'id_'))
df['max_id'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Or with wide_to_long:

tmp = pd.wide_to_long(df.reset_index(), stubnames=['id', 'val'],
                      i='index', j='col', sep='_')
df['max_id'] = (tmp.loc[tmp.groupby(level='index')['val'].idxmax(), 'id']
                   .droplevel('col')
               )

Or, if the columns are already sorted in the correct order (1,2,3… ; 1,2,3…), using :

ids = df.filter(like='id_').to_numpy()
vals = df.filter(like='val_').to_numpy()

df['max_id'] = ids[np.arange(ids.shape[0]), np.argmax(vals, axis=1)]

Output:

  id_1 id_2  val_1  val_2 max_id
0    a    q      1      2      q
1    b    w      2      0      b
2    c    e      3      0      c

Intermediates:

# idxmax + factorize approach:

  id_1 id_2  val_1  val_2 idxmax str.replace
0    a    q      1      2  val_2        id_2
1    b    w      2      0  val_1        id_1
2    c    e      3      0  val_1        id_1

# wide_to_long output:

          id  val
index col        
0     1    a    1
1     1    b    2    # max from index = 1
2     1    c    3    # max from index = 2
0     2    q    2    # max from index = 0
1     2    w    0
2     2    e    0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading