Given a polars DataFrame:
data = pl.DataFrame({"user_id": [1, 1, 1, 2, 2, 2], "login": [False, True, False, False, False, True]})
How could I add a column which adds the number of rows until the user next logs in, with any rows after the last login for that user being set to None? Example output for the above data is
[1, 0, None, 2, 1, 0]
I have tried adapting the answer from here with a backward_fill() but can not get it working
>Solution :
IIUC, you have to use backward_fill and invert the subtraction:
(data
.with_row_index()
.with_columns(distance =
pl.when("login").then("index").backward_fill().over("user_id") - pl.col.index
)
)
Output:
┌───────┬─────────┬───────┬──────────┐
│ index ┆ user_id ┆ login ┆ distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ bool ┆ u32 │
╞═══════╪═════════╪═══════╪══════════╡
│ 0 ┆ 1 ┆ false ┆ 1 │
│ 1 ┆ 1 ┆ true ┆ 0 │
│ 2 ┆ 1 ┆ false ┆ null │
│ 3 ┆ 2 ┆ false ┆ 2 │
│ 4 ┆ 2 ┆ false ┆ 1 │
│ 5 ┆ 2 ┆ true ┆ 0 │
└───────┴─────────┴───────┴──────────┘