I have a pandas dataframe, like this:
| ID | date | status |
|---|---|---|
| 10 | 2022-01-01 | 0 |
| 10 | 2022-01-02 | 0 |
| 10 | 2022-01-03 | 1 |
| 10 | 2022-01-04 | 1 |
| 10 | 2022-01-05 | 1 |
| 23 | 2022-02-02 | 0 |
| 23 | 2022-02-03 | 0 |
| 23 | 2022-02-04 | 1 |
| 23 | 2022-02-05 | 1 |
| 23 | 2022-02-06 | 1 |
I would like to group per ID and the first date on status is equal 1.
Expected output:
| ID | date | status | first_status |
|---|---|---|---|
| 10 | 2022-01-03 | 1 | 2022-01-03 |
| 23 | 2022-02-03 | 1 | 2022-02-03 |
afteer this I will merge this new DF with previous DF. Final DF:
| ID | date | status | first_status |
|---|---|---|---|
| 10 | 2022-01-01 | 0 | 2022-01-03 |
| 10 | 2022-01-02 | 0 | 2022-01-03 |
| 10 | 2022-01-03 | 1 | 2022-01-03 |
| 10 | 2022-01-04 | 1 | 2022-01-03 |
| 10 | 2022-01-05 | 1 | 2022-01-03 |
| 23 | 2022-02-02 | 0 | 2022-02-04 |
| 23 | 2022-02-03 | 0 | 2022-02-04 |
| 23 | 2022-02-04 | 1 | 2022-02-04 |
| 23 | 2022-02-05 | 1 | 2022-02-04 |
| 23 | 2022-02-06 | 1 | 2022-02-04 |
I tried many ways to do this, but unsuccessful
>Solution :
You can filter the status 1 rows, and get the first (or min depending on the use case) per group, then merge to the orginal dataframe:
df2 = (df[df['status'].eq(1)]
.groupby('ID', as_index=False)
['date'].first() # could also use "min()"
.rename(columns={'date': 'first_status'})
)
df.merge(df2, on='ID')
output:
ID date status first_status
0 10 2022-01-01 0 2022-01-03
1 10 2022-01-02 0 2022-01-03
2 10 2022-01-03 1 2022-01-03
3 10 2022-01-04 1 2022-01-03
4 10 2022-01-05 1 2022-01-03
5 23 2022-02-02 0 2022-02-04
6 23 2022-02-03 0 2022-02-04
7 23 2022-02-04 1 2022-02-04
8 23 2022-02-05 1 2022-02-04
9 23 2022-02-06 1 2022-02-04
intermediate df2:
ID first_status
0 10 2022-01-03
1 23 2022-02-04