Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to identify nth row that satisfies condition

I have a dataframe with months and sales. If somebody does 250 sales for 3 consecutive months, they get a bonus.

  1. Once the bonus month is identified, the counter resets and starts to count the next 3 consecutive months with 250
  2. if there is less than 250 sales, then the counter resets
Month sales 3 month consistency
Month 1 250
Month 2 250
Month 3 250 1
Month 4 250
Month 5 250
Month 6 250 1
Month 7 0
Month 8 250
Month 9 250
Month 10 250 1
Month 11 250
Month 12 250
Month 13 0
Month 14 250
Month 15 250
Month 16 250 1

I want to populate the 3 month consistency column

Tried using cum count with conditions but it did not work and I could not find such an example anywhere. What I tried was to use rolling window to identify every 3rd consecutive record with 250. But then it does not reset after the each set of 3 consecutive 250s.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby('id').apply(lambda x: x['gt_250'].rolling(min_periods=1, window=3).sum())
#-- grouped by id because this has to be computed for each salesperson_id

The result I got from it was,

Month sales 3 month consistency
Month 1 250
Month 2 250
Month 3 250 3
Month 4 250 3
Month 5 250 3

>Solution :

You could use a custom groupby.cumcount and mod:

N = 3 # number of consecutive months

# identify target months
m = df['sales'].ge(250)
# for groups of consecutive target months
group = (~m).cumsum()

# count the Nth repeat for each group
df['3 month consistency'] = (df[m].groupby(group).cumcount().mod(N).eq(N-1)
                             .map({True: 1})
                            )

NB. if you want a boolean column replace .map({True: 1}) by .reindex(df.index).eq(True).

Output:

       Month  sales  3 month consistency
0    Month 1    250                  NaN
1    Month 2    250                  NaN
2    Month 3    250                  1.0
3    Month 4    250                  NaN
4    Month 5    250                  NaN
5    Month 6    250                  1.0
6    Month 7      0                  NaN
7    Month 8    250                  NaN
8    Month 9    250                  NaN
9   Month 10    250                  1.0
10  Month 11    250                  NaN
11  Month 12    250                  NaN
12  Month 13      0                  NaN
13  Month 14    250                  NaN
14  Month 15    250                  NaN
15  Month 16    250                  1.0

Intermediates:

       Month  sales      m  group  cumcount  mod  3 month consistency
0    Month 1    250   True      0       0.0  0.0                  NaN
1    Month 2    250   True      0       1.0  1.0                  NaN
2    Month 3    250   True      0       2.0  2.0                  1.0
3    Month 4    250   True      0       3.0  0.0                  NaN
4    Month 5    250   True      0       4.0  1.0                  NaN
5    Month 6    250   True      0       5.0  2.0                  1.0
6    Month 7      0  False      1       NaN  NaN                  NaN
7    Month 8    250   True      1       0.0  0.0                  NaN
8    Month 9    250   True      1       1.0  1.0                  NaN
9   Month 10    250   True      1       2.0  2.0                  1.0
10  Month 11    250   True      1       3.0  0.0                  NaN
11  Month 12    250   True      1       4.0  1.0                  NaN
12  Month 13      0  False      2       NaN  NaN                  NaN
13  Month 14    250   True      2       0.0  0.0                  NaN
14  Month 15    250   True      2       1.0  1.0                  NaN
15  Month 16    250   True      2       2.0  2.0                  1.0

Variant (with a boolean output):

N = 3
m = df['sales'].ge(250)
group = m.ne(m.shift()).cumsum()
df['3 month consistency'] = df.groupby(group).cumcount().mod(N).eq(N-1)

Intermediates:

       Month  sales      m  group  cumcount  mod  3 month consistency
0    Month 1    250   True      1         0    0                False
1    Month 2    250   True      1         1    1                False
2    Month 3    250   True      1         2    2                 True
3    Month 4    250   True      1         3    0                False
4    Month 5    250   True      1         4    1                False
5    Month 6    250   True      1         5    2                 True
6    Month 7      0  False      2         0    0                False
7    Month 8    250   True      3         0    0                False
8    Month 9    250   True      3         1    1                False
9   Month 10    250   True      3         2    2                 True
10  Month 11    250   True      3         3    0                False
11  Month 12    250   True      3         4    1                False
12  Month 13      0  False      4         0    0                False
13  Month 14    250   True      5         0    0                False
14  Month 15    250   True      5         1    1                False
15  Month 16    250   True      5         2    2                 True

per ID

N = 3
m = df['sales'].ge(250)
group = m.ne(m.shift()).cumsum()
df['3 month consistency'] = df.groupby(['ID', group]).cumcount().mod(N).eq(N-1)

Example:

   ID     Month  sales  3 month consistency
0   A   Month 1    250                False
1   A   Month 2    250                False
2   A   Month 3    250                 True
3   A   Month 4    250                False
4   A   Month 5    250                False
5   A   Month 6    250                 True
6   A   Month 7      0                False
7   A   Month 8    250                False
8   A   Month 9    250                False
9   A  Month 10    250                 True
10  A  Month 11    250                False
11  A  Month 12    250                False
12  A  Month 13      0                False
13  A  Month 14    250                False
14  A  Month 15    250                False
15  B   Month 1    250                False  # independent from A
16  B   Month 2    250                False
17  B   Month 3    250                 True
18  B   Month 4    250                False
19  B   Month 5    250                False
20  B   Month 6    250                 True
21  B   Month 7      0                False
22  B   Month 8    250                False
23  B   Month 9    250                False
24  B  Month 10    250                 True
25  B  Month 11    250                False
26  B  Month 12    250                False
27  B  Month 13      0                False
28  B  Month 14    250                False
29  B  Month 15    250                False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading