Get the value of a column by max value in others

November 21, 2024

I have the df:

import pandas as pd
data = {"id_1":["a","b","c"],
        "id_2":["q","w","e"],
        "val_1":[1,2,3],
        "val_2":[2,0,0]}
df = pd.DataFrame(data)

#   id_1 id_2  val_1  val_2
# 0    a    q      1      2
# 1    b    w      2      0
# 2    c    e      3      0

I want to compare the values of val_1 and val_2 columns.
If val_1 is more than val_2 (they are never the same) I want to populate a new column with the value of id_1, else the value of id_2. To create:

  id_1 id_2  val_1  val_2 max_id
0    a    q      1      2      q
1    b    w      2      0      b
2    c    e      3      0      c

max_id of the first row is q because val_2>val_1…

I can get the max value with:
df[["val_1", "val_2"]].max(axis=1)

And the column name from which the fetch the ids with:

index_map = {0:"id_1", 1:"id_2"}
df.apply(lambda x: [x.val_1, x.val_2].index(max([x.val_1, x.val_2])), axis=1).map(index_map)
# 0    id_2
# 1    id_1
# 2    id_1

But then my ideas run out.

>Solution :

Since you have two columns, a simple approach would be to use where:

df['max_id'] = df['id_1'].where(df['val_1'].gt(df['val_2']), df['id_2'])

If you want to generalize to an arbitrary number of id/values pairs, you could combine idxmax and indexing lookup:

idx, cols = pd.factorize(df.filter(like='val_').idxmax(axis=1)
                           .str.replace('val_', 'id_'))
df['max_id'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Or with wide_to_long:

tmp = pd.wide_to_long(df.reset_index(), stubnames=['id', 'val'],
                      i='index', j='col', sep='_')
df['max_id'] = (tmp.loc[tmp.groupby(level='index')['val'].idxmax(), 'id']
                   .droplevel('col')
               )

Or, if the columns are already sorted in the correct order (1,2,3… ; 1,2,3…), using numpy:

ids = df.filter(like='id_').to_numpy()
vals = df.filter(like='val_').to_numpy()

df['max_id'] = ids[np.arange(ids.shape[0]), np.argmax(vals, axis=1)]

Output:

  id_1 id_2  val_1  val_2 max_id
0    a    q      1      2      q
1    b    w      2      0      b
2    c    e      3      0      c

Intermediates:

# idxmax + factorize approach:

  id_1 id_2  val_1  val_2 idxmax str.replace
0    a    q      1      2  val_2        id_2
1    b    w      2      0  val_1        id_1
2    c    e      3      0  val_1        id_1

# wide_to_long output:

          id  val
index col        
0     1    a    1
1     1    b    2    # max from index = 1
2     1    c    3    # max from index = 2
0     2    q    2    # max from index = 0
1     2    w    0
2     2    e    0