Is there a way to rewrite this:
df = (polars
.DataFrame(dict(
j=numpy.random.randint(10, 99, 20),
))
.with_row_count()
.select(
g=polars.col('row_nr') // 3,
j='j'
)
.with_columns(rn=1)
.with_columns(
rn=polars.col('rn').shift().fill_null(0).cumsum().over('g')
)
)
print(df)
g (u32) j (i64) rn (i32)
0 47 0
0 22 1
0 82 2
1 19 0
1 85 1
1 15 2
2 89 0
2 74 1
2 26 2
3 11 0
3 86 1
3 81 2
4 16 0
4 35 1
4 60 2
5 30 0
5 28 1
5 94 2
6 21 0
6 38 1
shape: (20, 3)
so it adds rn column without requiring it to add a column full of 1s first? I.e. somehow rewrite this part:
.with_columns(rn=1)
.with_columns(
rn=polars.col('rn').shift().fill_null(0).cumsum().over('g')
)
so that:
.with_columns(rn=1)
is not required? Basically reduce two expressions to one.
Or any other / better way to add a row count per group?
>Solution :
What you’re doing is also known as the .cumcount()
df.with_columns(rn = pl.col("j").cumcount().over("g"))
shape: (20, 3)
┌─────┬─────┬─────┐
│ g ┆ j ┆ rn │
│ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ u32 │
╞═════╪═════╪═════╡
│ 0 ┆ 92 ┆ 0 │
│ 0 ┆ 24 ┆ 1 │
│ 0 ┆ 45 ┆ 2 │
│ 1 ┆ 78 ┆ 0 │
│ … ┆ … ┆ … │
│ 5 ┆ 68 ┆ 1 │
│ 5 ┆ 59 ┆ 2 │
│ 6 ┆ 38 ┆ 0 │
│ 6 ┆ 83 ┆ 1 │
└─────┴─────┴─────┘