Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Modify rows between two flags (values) in dataframe columns

I want to create a new dataframe with the same shape based on two existing dataframes. I have one dataframe that represents the flags and another one with the values I want to replace.

The flag dataframe has only 1, -1 and NaNs, and always after a 1 I’ll have a -1. So basically its a "changing state" kind of dataframe. What I want to do is: in between the interval of 1 and -1, I need to fill in the average of the same interval in the second dataframe, PR.

flag = pd.DataFrame({'col1': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                   'col2': [np.nan,1,-1,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,-1],
                   'col3': [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,-1],
                   'col4': [np.nan,np.nan,np.nan,np.nan,np.nan,1,np.nan,-1,np.nan,np.nan,np.nan]
                   })

PR = pd.DataFrame({'col1': [81,81.3,80.7,81.5,81,80.4,80.3,81,79.5,80.7],
                   'col2': [80.9,81.6,81.2,81.7,80.9,79.7,79.3,79.1,79,77.5],
                   'col3': [81.1,81.3,81,81.6,80.8,79.5,79.2,78.8,78.8,77.4],
                   'col4': [80.1,80.6,79.9,80.4,80.4,79.3,79,78.8,78.4,77]
                   })

This would have to give me:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

result = pd.DataFrame({'col1': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
               'col2': [np.nan,81.4,81.4,79.6,79.6,79.6,79.6,79.6,79.6,79.6,79.6],
               'col3': [np.nan,np.nan,np.nan,79.44,79.44,79.44,79.44,79.44,79.44,79.44,79.44],
               'col4': [np.nan,np.nan,np.nan,np.nan,np.nan,79.03,79.03,79.03,np.nan,np.nan,np.nan]
               })

Any help is much appreciated!

>Solution :

I would use a custom function:

def process(s, ref=flag):
    f = ref[s.name] # get matching flag

    # create group and mask data outside of 1 -> -1
    m = f.eq(1).cummax() & f[::-1].eq(-1).cummax()
    group = f.eq(1).cumsum().where(m)

    # transform to mean
    return s.groupby(group).transform('mean') 

out = PR.apply(process, ref=flag).round(2)

Output:

   col1  col2   col3   col4
0   NaN   NaN    NaN    NaN
1   NaN  81.4    NaN    NaN
2   NaN  81.4    NaN    NaN
3   NaN  79.6  79.44    NaN
4   NaN  79.6  79.44    NaN
5   NaN  79.6  79.44  79.03
6   NaN  79.6  79.44  79.03
7   NaN  79.6  79.44  79.03
8   NaN  79.6  79.44    NaN
9   NaN  79.6  79.44    NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading