Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Polars: Number of Rows until the next value in a group

Given a polars DataFrame:
data = pl.DataFrame({"user_id": [1, 1, 1, 2, 2, 2], "login": [False, True, False, False, False, True]})

How could I add a column which adds the number of rows until the user next logs in, with any rows after the last login for that user being set to None? Example output for the above data is
[1, 0, None, 2, 1, 0]

I have tried adapting the answer from here with a backward_fill() but can not get it working

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

IIUC, you have to use backward_fill and invert the subtraction:

(data
   .with_row_index()
   .with_columns(distance = 
      pl.when("login").then("index").backward_fill().over("user_id") - pl.col.index
   )
)

Output:

┌───────┬─────────┬───────┬──────────┐
│ index ┆ user_id ┆ login ┆ distance │
│ ---   ┆ ---     ┆ ---   ┆ ---      │
│ u32   ┆ i64     ┆ bool  ┆ u32      │
╞═══════╪═════════╪═══════╪══════════╡
│ 0     ┆ 1       ┆ false ┆ 1        │
│ 1     ┆ 1       ┆ true  ┆ 0        │
│ 2     ┆ 1       ┆ false ┆ null     │
│ 3     ┆ 2       ┆ false ┆ 2        │
│ 4     ┆ 2       ┆ false ┆ 1        │
│ 5     ┆ 2       ┆ true  ┆ 0        │
└───────┴─────────┴───────┴──────────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading