Modify rows between two flags (values) in dataframe columns

November 24, 2022

I want to create a new dataframe with the same shape based on two existing dataframes. I have one dataframe that represents the flags and another one with the values I want to replace.

The flag dataframe has only 1, -1 and NaNs, and always after a 1 I’ll have a -1. So basically its a "changing state" kind of dataframe. What I want to do is: in between the interval of 1 and -1, I need to fill in the average of the same interval in the second dataframe, PR.

flag = pd.DataFrame({'col1': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                   'col2': [np.nan,1,-1,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,-1],
                   'col3': [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,-1],
                   'col4': [np.nan,np.nan,np.nan,np.nan,np.nan,1,np.nan,-1,np.nan,np.nan,np.nan]
                   })

PR = pd.DataFrame({'col1': [81,81.3,80.7,81.5,81,80.4,80.3,81,79.5,80.7],
                   'col2': [80.9,81.6,81.2,81.7,80.9,79.7,79.3,79.1,79,77.5],
                   'col3': [81.1,81.3,81,81.6,80.8,79.5,79.2,78.8,78.8,77.4],
                   'col4': [80.1,80.6,79.9,80.4,80.4,79.3,79,78.8,78.4,77]
                   })

This would have to give me:

result = pd.DataFrame({'col1': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
               'col2': [np.nan,81.4,81.4,79.6,79.6,79.6,79.6,79.6,79.6,79.6,79.6],
               'col3': [np.nan,np.nan,np.nan,79.44,79.44,79.44,79.44,79.44,79.44,79.44,79.44],
               'col4': [np.nan,np.nan,np.nan,np.nan,np.nan,79.03,79.03,79.03,np.nan,np.nan,np.nan]
               })

Any help is much appreciated!

>Solution :

I would use a custom function:

def process(s, ref=flag):
    f = ref[s.name] # get matching flag

    # create group and mask data outside of 1 -> -1
    m = f.eq(1).cummax() & f[::-1].eq(-1).cummax()
    group = f.eq(1).cumsum().where(m)

    # transform to mean
    return s.groupby(group).transform('mean') 

out = PR.apply(process, ref=flag).round(2)

Output:

   col1  col2   col3   col4
0   NaN   NaN    NaN    NaN
1   NaN  81.4    NaN    NaN
2   NaN  81.4    NaN    NaN
3   NaN  79.6  79.44    NaN
4   NaN  79.6  79.44    NaN
5   NaN  79.6  79.44  79.03
6   NaN  79.6  79.44  79.03
7   NaN  79.6  79.44  79.03
8   NaN  79.6  79.44    NaN
9   NaN  79.6  79.44    NaN