Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove dips of data in Pandas dataframe

I have some time series data in a pandas dataframe that I know should always increase like but has some incorrect low values. Like below.

22-01-17   0
22-01-18   45
22-01-19   78
22-01-20   98
22-01-21   6            // bad
22-01-22   7            // bad
22-01-23   4            // bad
22-01-24   101

How can I remove regions of the data that are less that the previous good value.

I don’t mind it we remove those values or replace with the last good value.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So using the example above how could I get

ie

22-01-17   0
22-01-18   45
22-01-19   78
22-01-20   98
22-01-21   98
22-01-22   98
22-01-23   98
22-01-24   101

or

22-01-17   0
22-01-18   45
22-01-19   78
22-01-20   98
22-01-21   NaN
22-01-22   NaN
22-01-23   NaN
22-01-24   101

Thanks

>Solution :

Assuming s your Series.

To get the first option:

s.cummax()

output:

22-01-17      0
22-01-18     45
22-01-19     78
22-01-20     98
22-01-21     98
22-01-22     98
22-01-23     98
22-01-24    101
dtype: int64

for the second:

s.mask(s.lt(s.cummax()))

output:

22-01-17      0.0
22-01-18     45.0
22-01-19     78.0
22-01-20     98.0
22-01-21      NaN
22-01-22      NaN
22-01-23      NaN
22-01-24    101.0
dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading