Set value of DataFrame column based on value of other columns & rows

I have a DataFrame df with four columns; Date, Location, Category, and Value. For each Date and Location, I try to change the values in column Value which contain a value in the other column Category with a 5 or higher and replace it by the value of the row with Category 5.

df:
Date       Location  Category  Value
20220101   FE        1         0.23
20220101   FE        2         0.24
20220101   FE        3         0.26
20220101   FE        4         0.27
20220101   FE        5         0.28
20220101   FE        6         0.30
20220101   RP        5         0.32
20220101   RP        6         0.35
20220102   FE        1         0.20
20220102   FE        2         0.23
20220102   FE        3         0.25
20220102   FE        4         0.26
20220102   FE        5         0.28
20220102   FE        6         0.32
df_new:
Date       Location  Category  Value
20220101   FE        1         0.23
20220101   FE        2         0.24
20220101   FE        3         0.26
20220101   FE        4         0.27
20220101   FE        5         0.28
20220101   FE        6         0.28 <-- changed with value from row with Category == 5
20220101   RP        5         0.32
20220101   RP        6         0.32 <-- changed with value from row with Category == 5
20220102   FE        1         0.20
20220102   FE        2         0.23
20220102   FE        3         0.25
20220102   FE        4         0.26
20220102   FE        5         0.28
20220102   FE        6         0.28 <-- changed with value from row with Category == 5

So far, I was only able to extract the Value of a specific Date and Location of the Category = 5.

df.loc[(df['Date'] == 20220101) & (df['Location'] == 'FE') & (df['Category'] == 5), 'Value'].iloc[0]

Is there an easy and efficient way to change the column values in the column Value? Many thanks!

For reproducability:

df = pd.DataFrame({
    'Date':[20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102],
    'Location':['FE', 'FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'FE', 'FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP'],
    'Category':[1, 2, 3, 4, 5, 6, 5, 6, 1, 2, 3, 4, 5, 6, 5, 6],
    'Value':[0.23, 0.24, 0.26, 0.27, 0.28, 0.3, 0.32, 0.35, 0.2, 0.23, 0.25, 0.26, 0.28, 0.32, 0.34, 0.36]
})

>Solution :

Assuming the Categories are in ascending order per group, you can mask the values for Categories > 5, and groupby.ffill:

df['Value'] = (df['Value'].mask(df['Category'].gt(5))
               .groupby([df['Date'], df['Location']])
               .ffill()
              )

output (as new column Value2 for comparison):

        Date Location  Category  Value  Value2
0   20220101       FE         1   0.23    0.23
1   20220101       FE         2   0.24    0.24
2   20220101       FE         3   0.26    0.26
3   20220101       FE         4   0.27    0.27
4   20220101       FE         5   0.28    0.28
5   20220101       FE         6   0.30    0.28
6   20220101       RP         5   0.32    0.32
7   20220101       RP         6   0.35    0.32
8   20220102       FE         1   0.20    0.20
9   20220102       FE         2   0.23    0.23
10  20220102       FE         3   0.25    0.25
11  20220102       FE         4   0.26    0.26
12  20220102       FE         5   0.28    0.28
13  20220102       FE         6   0.32    0.28
14  20220102       RP         5   0.34    0.34
15  20220102       RP         6   0.36    0.34

Leave a Reply