I am doing cohort analysis and the dataset I’m using has 15 months as the name as columns with revenue and around 7k user_id rows.
I need to get a new column with the month when the user was last time active.
| 2021-01-01 | 2021-02-01 |
|---|---|
| 3456. | Nan |
| Nan. | 8679 |
Result should be like this
| 2021-01-01 | 2021-02-01 | Last_month |
|---|---|---|
| 3456. | Nan | 2021-01-01 |
| Nan. | 8679 | 2021-02-01 |
I have tried few options but it didnt work
users.apply(pd.Series.last_valid_index)
>Solution :
Example
data = {'2021-01-01': {0: 3456, 1: None}, '2021-02-01': {0: None, 1: 8679}}
df = pd.DataFrame(data)
df
2021-01-01 2021-02-01
0 3456.0 NaN
1 NaN 8679.0
Code
df.apply(lambda x: x.last_valid_index(), axis=1)
output:
0 2021-01-01
1 2021-02-01
dtype: object
full Code
df.assign(Last_month=df.apply(lambda x: x.last_valid_index(), axis=1))
result:
2021-01-01 2021-02-01 Last_month
0 3456.0 NaN 2021-01-01
1 NaN 8679.0 2021-02-01