I’d like to create a conditional incremented column in polars.
It should start from 1 and increment only if a certain condition (pl.col(‘code’) == ‘L’) is met.
import polars as pl
df = pl.DataFrame({'file': ['a.txt','a.txt','a.txt','a.txt','b.txt','b.txt','c.txt','c.txt','c.txt','c.txt','c.txt'],
'code': ['X','Y','Z','L','A','A','B','L','C','L','X']
})
df.with_columns(pl.int_range(start=1, end=pl.len()+1).over('file').alias('rrr')
)
This produces a simple unconditional increment. But how do I add conditions?
>Solution :
Not sure which output exactly you’re expecting, but here’s an example of incrementing the counter only at rows which meet the criteria, using cum_sum():
df.with_columns(
pl.when(pl.col('code') == 'L').then(pl.lit(1)).otherwise(pl.lit(0)).alias('rrr')
).with_columns(
pl.col('rrr').cum_sum().over('file') + 1
)
┌───────┬──────┬─────┐
│ file ┆ code ┆ rrr │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i32 │
╞═══════╪══════╪═════╡
│ a.txt ┆ X ┆ 1 │
│ a.txt ┆ Y ┆ 1 │
│ a.txt ┆ Z ┆ 1 │
│ a.txt ┆ L ┆ 2 │
│ b.txt ┆ A ┆ 1 │
│ b.txt ┆ A ┆ 1 │
│ c.txt ┆ B ┆ 1 │
│ c.txt ┆ L ┆ 2 │
│ c.txt ┆ C ┆ 2 │
│ c.txt ┆ L ┆ 3 │
│ c.txt ┆ X ┆ 3 │
└───────┴──────┴─────┘