Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Rolling window count with condition

How is it possible to select only some rows, based on a given condition, in a Pandas rolling window count ?
I have not find any solution in the documentation, or in other questions.

With the following dataframe:

random.seed(42)
date_0 = datetime.datetime(2020, 1, 1, 0, 0, 0, 0)
dates = [date_0 + datetime.timedelta(seconds=random.uniform(0, 120)) for i in range(500)]
dates.sort()
speeds = [random.uniform(1, 10) for i in range(500)]
speeds.sort()
pressures = [i**2 + random.normalvariate(0, 1) for i in speeds]
data = [speeds, pressures]
df = pd.DataFrame(data=list(zip(speeds, pressures)), columns=['speed', 'pressure'], index=dates)
dates_1 = random.sample(dates, int(len(dates) * 0.6))
df.loc[:, 'controlled'] = False
df.loc[dates_1, 'controlled'] = True
df.loc[:, 'rolling_obs_count'] = df.loc[:, 'speed'].rolling(window=str(1) + 's').count()
print('df: \n', df)
df: 
                                speed    pressure  controlled  rolling_obs_count
2020-01-01 00:00:00.048713  1.024082    4.491483        True                1.0
2020-01-01 00:00:00.068628  1.084577    0.773474        True                2.0
2020-01-01 00:00:00.202953  1.091360    0.584872        True                3.0
2020-01-01 00:00:00.425483  1.125268    2.361184       False                4.0
2020-01-01 00:00:00.665378  1.127335    2.050226        True                5.0
...                              ...         ...         ...                ...
2020-01-01 00:01:59.531574  9.945263   98.811644        True                5.0
2020-01-01 00:01:59.534566  9.976833   99.481287       False                6.0
2020-01-01 00:01:59.704513  9.990121   99.452698       False                6.0
2020-01-01 00:01:59.814533  9.996152   99.479074        True                6.0
2020-01-01 00:01:59.913896  9.999170  100.584748        True                7.0

the count function counts all rows within the rolling window, and I need to count only the rows where "controlled" column is "True". How can I do that ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Perhaps simply:

>>> df.rolling('1s')['controlled'].sum()
2020-01-01 00:00:00.048713    1.0
2020-01-01 00:00:00.068628    2.0
2020-01-01 00:00:00.202953    3.0
2020-01-01 00:00:00.425483    3.0
2020-01-01 00:00:00.665378    4.0
                             ... 
2020-01-01 00:01:59.531574    5.0
2020-01-01 00:01:59.534566    5.0
2020-01-01 00:01:59.704513    4.0
2020-01-01 00:01:59.814533    4.0
2020-01-01 00:01:59.913896    5.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading