Python Pandas: propagate value on conditions

byMR

September 29, 2023

I try to propagate a value from one row to the next and I’m not sure where to start (pandas).

For one user,

if some values come from the source B, I want to replace it with the values from the date just before.
If there is no date before, I want to keep the value in B.
Lines from source A should not change

Source data:

USER	Date	Value	Source
USER1	01/01/2023	1000	A
USER1	01/02/2023	1200	A
USER1	02/02/2023	1300	A
USER1	20/02/2023	1000	B
USER1	01/07/2023	1400	A
USER2	01/01/2023	1000	A
USER2	01/02/2023	1200	A
USER2	02/02/2023	1250	B
USER2	20/02/2023	1300	A
USER2	01/07/2023	1400	A

Target data:

USER	Date	Value	Source	Comment
USER1	01/01/2023	1000	A
USER1	01/02/2023	1200	A
USER1	02/02/2023	1300	A
USER1	20/02/2023	1300	B	get its value "1300" from the previous date of the same user
USER1	01/07/2023	1400	A
USER2	01/01/2023	1000	A
USER2	01/02/2023	1200	A
USER2	02/02/2023	1200	B	get its value "1200" from the previous date of the same user
USER2	20/02/2023	1300	A
USER2	01/07/2023	1400	A

df = pd.DataFrame({"USER" : ["USER1", "USER1", "USER1", "USER1", "USER1", "USER2", "USER2", "USER2", "USER2", "USER2"],
    "Date" :["01/01/2023", "01/02/2023", "02/02/2023", "20/02/2023", "01/07/2023", "01/01/2023", "01/02/2023", "02/02/2023", "20/02/2023", "01/07/2023"],
    "Value" :["1000", "1200", "1300", "1000", "1400", "1000", "1200", "1250", "1300", "1400"],
    "Source" :["A", "A", "A", "B", "A", "A", "A", "B", "A", "A"]}
)

>Solution :

Assuming the dates are already sorted, you could mask, groupby.ffill, and fillna in case there was no previous value:

m = df['Source'].eq('B')
df['Value'] = (df['Value'].mask(m)          # mask B values
               .groupby(df['USER']).ffill() # replace with previous non-B
               .fillna(df['Value'])         # if still NaN, replace with original
               #.mask(df['Value'].isna())
              )

NB. if you can have NaNs in the original Series, uncomment the last line.

Output:

    USER        Date Value Source
0  USER1  01/01/2023  1000      A
1  USER1  01/02/2023  1200      A
2  USER1  02/02/2023  1300      A
3  USER1  20/02/2023  1300      B
4  USER1  01/07/2023  1400      A
5  USER2  01/01/2023  1000      A
6  USER2  01/02/2023  1200      A
7  USER2  02/02/2023  1200      B
8  USER2  20/02/2023  1300      A
9  USER2  01/07/2023  1400      A