Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to only return consecutive values in a pandas dataframe

I have a modified a dataframe to only include the line if its over a certain value. Now in this new dataframe, I want to modify it further to only include the lines that are consecutive. For example, if this is the dataframe with the index and then Timestamp and Value headers:

                       Timestamp     Value
316 2022-03-01T16:28:50.4580078Z    36.000
344 2022-03-01T17:26:25.8170166Z    37.500
345 2022-03-01T17:26:45.8480072Z    41.000
346 2022-03-01T17:27:05.8800048Z    35.250
374 2022-03-01T18:23:10.9630126Z    36.000
387 2022-03-01T18:36:12.1380004Z    42.000
388 2022-03-01T18:36:32.1690063Z    37.000
391 2022-03-01T18:37:52.2950134Z    34.500

this should return a new dataframe with only 344,345,346,387,and 388 because these are the only lines that have a consecutive neighbor in the index

                       Timestamp     Value
344 2022-03-01T17:26:25.8170166Z    37.500
345 2022-03-01T17:26:45.8480072Z    41.000
346 2022-03-01T17:27:05.8800048Z    35.250
387 2022-03-01T18:36:12.1380004Z    42.000
388 2022-03-01T18:36:32.1690063Z    37.000

I have tried iterating and appending to a new list, using df.iloc[i] and df.iloc[i-1] and still cant seem to get it working properly.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

based on the time

You can compute the successive time differences and compare them to a threshold (here 1 minute). Keep the rows for which the mask is True (or that of the row below):

thresh = '1min'

m = (pd.to_datetime(df['Timestamp'])
       .diff().lt(thresh)
    )

out = df[m|m.shift(-1)]

based on the index

Same logic but checking a successive difference of 1 in the index.

m = df.index.to_series().diff().eq(1)

out = df[m|m.shift(-1)]

Output:


                              Timestamp  Value
344 2022-03-01 17:26:25.817016600+00:00  37.50
345 2022-03-01 17:26:45.848007200+00:00  41.00
346 2022-03-01 17:27:05.880004800+00:00  35.25
387 2022-03-01 18:36:12.138000400+00:00  42.00
388 2022-03-01 18:36:32.169006300+00:00  37.00

Intermediates:

                              Timestamp  Value      m m.shift(-1)
316 2022-03-01 16:28:50.458007800+00:00  36.00  False       False
344 2022-03-01 17:26:25.817016600+00:00  37.50  False        True
345 2022-03-01 17:26:45.848007200+00:00  41.00   True        True
346 2022-03-01 17:27:05.880004800+00:00  35.25   True       False
374 2022-03-01 18:23:10.963012600+00:00  36.00  False       False
387 2022-03-01 18:36:12.138000400+00:00  42.00  False        True
388 2022-03-01 18:36:32.169006300+00:00  37.00   True       False
391 2022-03-01 18:37:52.295013400+00:00  34.50  False         NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading