Consider looping through my DataFrame:
import pandas as pd
df = pd.DataFrame({
'Price': [1000, 1000, 1000, 2000, 2000, 2000, 2000, 1400, 1400],
'Count': [0, 0, 0, 0, 0, 0, 0, 0, 0]
})
for idx in df.index:
if df['Price'].iloc[idx] > 1500:
if idx > 0:
df['Count'].iloc[idx] = df['Count'].iloc[idx - 1] + 1
Resulting in:
| Price | Count | |
|---|---|---|
| 0 | 1000 | 0 |
| 1 | 1000 | 0 |
| 2 | 1000 | 0 |
| 3 | 2000 | 1 |
| 4 | 2000 | 2 |
| 5 | 2000 | 3 |
| 6 | 2000 | 4 |
| 7 | 1400 | 0 |
| 8 | 1400 | 0 |
Is there a more efficient way to do this?
>Solution :
Create pseudo-groups using Series.cumsum, then use groupby.cumcount to generate the within-group counts:
groups = df.Price.le(1500).cumsum()
df['Count'] = df.Price.gt(1500).groupby(groups).cumcount()
# Price Count
# 0 1000 0
# 1 1000 0
# 2 1000 0
# 3 2000 1
# 4 2000 2
# 5 2000 3
# 6 2000 4
# 7 1400 0
# 8 1400 0