Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Pandas: propagate value on conditions

I try to propagate a value from one row to the next and I’m not sure where to start (pandas).

For one user,

  • if some values come from the source B, I want to replace it with the values from the date just before.
  • If there is no date before, I want to keep the value in B.
  • Lines from source A should not change

Source data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

USER Date Value Source
USER1 01/01/2023 1000 A
USER1 01/02/2023 1200 A
USER1 02/02/2023 1300 A
USER1 20/02/2023 1000 B
USER1 01/07/2023 1400 A
USER2 01/01/2023 1000 A
USER2 01/02/2023 1200 A
USER2 02/02/2023 1250 B
USER2 20/02/2023 1300 A
USER2 01/07/2023 1400 A

Target data:

USER Date Value Source Comment
USER1 01/01/2023 1000 A
USER1 01/02/2023 1200 A
USER1 02/02/2023 1300 A
USER1 20/02/2023 1300 B get its value "1300" from the previous date of the same user
USER1 01/07/2023 1400 A
USER2 01/01/2023 1000 A
USER2 01/02/2023 1200 A
USER2 02/02/2023 1200 B get its value "1200" from the previous date of the same user
USER2 20/02/2023 1300 A
USER2 01/07/2023 1400 A
df = pd.DataFrame({"USER" : ["USER1", "USER1", "USER1", "USER1", "USER1", "USER2", "USER2", "USER2", "USER2", "USER2"],
    "Date" :["01/01/2023", "01/02/2023", "02/02/2023", "20/02/2023", "01/07/2023", "01/01/2023", "01/02/2023", "02/02/2023", "20/02/2023", "01/07/2023"],
    "Value" :["1000", "1200", "1300", "1000", "1400", "1000", "1200", "1250", "1300", "1400"],
    "Source" :["A", "A", "A", "B", "A", "A", "A", "B", "A", "A"]}
)

>Solution :

Assuming the dates are already sorted, you could mask, groupby.ffill, and fillna in case there was no previous value:

m = df['Source'].eq('B')
df['Value'] = (df['Value'].mask(m)          # mask B values
               .groupby(df['USER']).ffill() # replace with previous non-B
               .fillna(df['Value'])         # if still NaN, replace with original
               #.mask(df['Value'].isna())
              )

NB. if you can have NaNs in the original Series, uncomment the last line.

Output:

    USER        Date Value Source
0  USER1  01/01/2023  1000      A
1  USER1  01/02/2023  1200      A
2  USER1  02/02/2023  1300      A
3  USER1  20/02/2023  1300      B
4  USER1  01/07/2023  1400      A
5  USER2  01/01/2023  1000      A
6  USER2  01/02/2023  1200      A
7  USER2  02/02/2023  1200      B
8  USER2  20/02/2023  1300      A
9  USER2  01/07/2023  1400      A
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading