Python Pandas: propagate value on conditions

I try to propagate a value from one row to the next and I’m not sure where to start (pandas).

For one user,

  • if some values come from the source B, I want to replace it with the values from the date just before.
  • If there is no date before, I want to keep the value in B.
  • Lines from source A should not change

Source data:

USER Date Value Source
USER1 01/01/2023 1000 A
USER1 01/02/2023 1200 A
USER1 02/02/2023 1300 A
USER1 20/02/2023 1000 B
USER1 01/07/2023 1400 A
USER2 01/01/2023 1000 A
USER2 01/02/2023 1200 A
USER2 02/02/2023 1250 B
USER2 20/02/2023 1300 A
USER2 01/07/2023 1400 A

Target data:

USER Date Value Source Comment
USER1 01/01/2023 1000 A
USER1 01/02/2023 1200 A
USER1 02/02/2023 1300 A
USER1 20/02/2023 1300 B get its value "1300" from the previous date of the same user
USER1 01/07/2023 1400 A
USER2 01/01/2023 1000 A
USER2 01/02/2023 1200 A
USER2 02/02/2023 1200 B get its value "1200" from the previous date of the same user
USER2 20/02/2023 1300 A
USER2 01/07/2023 1400 A
df = pd.DataFrame({"USER" : ["USER1", "USER1", "USER1", "USER1", "USER1", "USER2", "USER2", "USER2", "USER2", "USER2"],
    "Date" :["01/01/2023", "01/02/2023", "02/02/2023", "20/02/2023", "01/07/2023", "01/01/2023", "01/02/2023", "02/02/2023", "20/02/2023", "01/07/2023"],
    "Value" :["1000", "1200", "1300", "1000", "1400", "1000", "1200", "1250", "1300", "1400"],
    "Source" :["A", "A", "A", "B", "A", "A", "A", "B", "A", "A"]}
)

>Solution :

Assuming the dates are already sorted, you could mask, groupby.ffill, and fillna in case there was no previous value:

m = df['Source'].eq('B')
df['Value'] = (df['Value'].mask(m)          # mask B values
               .groupby(df['USER']).ffill() # replace with previous non-B
               .fillna(df['Value'])         # if still NaN, replace with original
               #.mask(df['Value'].isna())
              )

NB. if you can have NaNs in the original Series, uncomment the last line.

Output:

    USER        Date Value Source
0  USER1  01/01/2023  1000      A
1  USER1  01/02/2023  1200      A
2  USER1  02/02/2023  1300      A
3  USER1  20/02/2023  1300      B
4  USER1  01/07/2023  1400      A
5  USER2  01/01/2023  1000      A
6  USER2  01/02/2023  1200      A
7  USER2  02/02/2023  1200      B
8  USER2  20/02/2023  1300      A
9  USER2  01/07/2023  1400      A

Leave a Reply