Home How to assign scores to each value in pandas columns based on percentile range, getting `Truth value of a Series is ambiguous.` error

Questions

How to assign scores to each value in pandas columns based on percentile range, getting `Truth value of a Series is ambiguous.` error

byMR

November 15, 2024

I need to assign scores to each of the values in many columns of a pandas dataframe, depending on the percentile score range each value falls between.

I have created a function:

import pandas as pd
import numpy as np

def get_percentiles(x, percentile_array):
    percentile_array = np.sort(np.array(percentile_array))
    if x < x.quantile(percentile_array[0]) < 0:
        return 1
    elif (x >= x.quantile(percentile_array[0]) & (x < x.quantile(percentile_array[1]):
        return 2
    elif (x >= x.quantile(percentile_array[1]) & (x < x.quantile(percentile_array[2]):
        return 3
    elif (x >= x.quantile(percentile_array[2]) & (x < x.quantile(percentile_array[3]):
        return 4
    else:
        return 5

Sample data:

df = pd.DataFrame({'col1' : [1,10,5,9,15,4],
                   'col2' : [4,10,15,19,3,2],
                   'col3' : [10,5,6,9,1,24]})

When I try to run the function using apply:

percentile_array = [0.05, 0.25, 0.5, 0.75]

df.apply(lambda x : get_percentiles(x, percentile_array), result_type = 'expand')

I get below error:

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

The expected output is new dataframe with 3 columns that has the scores between 1 and 5 depending on which percentile range each value in each column falls under

>Solution :

IIUC, you could use rank to compute the percentile (per column), then pandas.cut to bin the values to your reference:

percentile_array = [0.05, 0.25, 0.5, 0.75]
bins = [-np.inf]+percentile_array+[np.inf]
labels = [1, 2, 3, 4, 5]

out = (df.rank(pct=True)
         .apply(lambda c: pd.cut(c, bins=bins, labels=labels))
      )

Alternatively, with numpy.searchsorted:

percentile_array = [0.05, 0.25, 0.5, 0.75]
bins = [-np.inf]+percentile_array+[np.inf]

out = pd.DataFrame(np.searchsorted(bins, df.rank(pct=True)),
                   columns=df.columns, index=df.index)

Output:

  col1 col2 col3
0    2    3    5
1    5    4    3
2    3    5    3
3    4    5    4
4    5    3    2
5    3    2    5

Intermediate:

df.rank(pct=True)

       col1      col2      col3
0  0.166667  0.500000  0.833333
1  0.833333  0.666667  0.333333
2  0.500000  0.833333  0.500000
3  0.666667  1.000000  0.666667
4  1.000000  0.333333  0.166667
5  0.333333  0.166667  1.000000