How to create a conditional incremented column in polars?

March 12, 2024

I’d like to create a conditional incremented column in polars.
It should start from 1 and increment only if a certain condition (pl.col(‘code’) == ‘L’) is met.

import polars as pl
df = pl.DataFrame({'file': ['a.txt','a.txt','a.txt','a.txt','b.txt','b.txt','c.txt','c.txt','c.txt','c.txt','c.txt'],
                   'code': ['X','Y','Z','L','A','A','B','L','C','L','X']
                   })
df.with_columns(pl.int_range(start=1, end=pl.len()+1).over('file').alias('rrr')
                )

This produces a simple unconditional increment. But how do I add conditions?

>Solution :

Not sure which output exactly you’re expecting, but here’s an example of incrementing the counter only at rows which meet the criteria, using cum_sum():

df.with_columns(
    pl.when(pl.col('code') == 'L').then(pl.lit(1)).otherwise(pl.lit(0)).alias('rrr')
).with_columns(
    pl.col('rrr').cum_sum().over('file') + 1
)

┌───────┬──────┬─────┐
│ file  ┆ code ┆ rrr │
│ ---   ┆ ---  ┆ --- │
│ str   ┆ str  ┆ i32 │
╞═══════╪══════╪═════╡
│ a.txt ┆ X    ┆ 1   │
│ a.txt ┆ Y    ┆ 1   │
│ a.txt ┆ Z    ┆ 1   │
│ a.txt ┆ L    ┆ 2   │
│ b.txt ┆ A    ┆ 1   │
│ b.txt ┆ A    ┆ 1   │
│ c.txt ┆ B    ┆ 1   │
│ c.txt ┆ L    ┆ 2   │
│ c.txt ┆ C    ┆ 2   │
│ c.txt ┆ L    ┆ 3   │
│ c.txt ┆ X    ┆ 3   │
└───────┴──────┴─────┘