Add row count per group in polars

August 11, 2023

Is there a way to rewrite this:

df = (polars
  .DataFrame(dict(
    j=numpy.random.randint(10, 99, 20),
    ))
  .with_row_count()
  .select(
    g=polars.col('row_nr') // 3,
    j='j'
    )
  .with_columns(rn=1)
  .with_columns(
    rn=polars.col('rn').shift().fill_null(0).cumsum().over('g')
    )
  )
print(df)

 g (u32)  j (i64)  rn (i32)
 0        47       0
 0        22       1
 0        82       2
 1        19       0
 1        85       1
 1        15       2
 2        89       0
 2        74       1
 2        26       2
 3        11       0
 3        86       1
 3        81       2
 4        16       0
 4        35       1
 4        60       2
 5        30       0
 5        28       1
 5        94       2
 6        21       0
 6        38       1
shape: (20, 3)

so it adds rn column without requiring it to add a column full of 1s first? I.e. somehow rewrite this part:

  .with_columns(rn=1)
  .with_columns(
    rn=polars.col('rn').shift().fill_null(0).cumsum().over('g')
    )

so that:

  .with_columns(rn=1)

is not required? Basically reduce two expressions to one.

Or any other / better way to add a row count per group?

>Solution :

What you’re doing is also known as the .cumcount()

df.with_columns(rn = pl.col("j").cumcount().over("g"))

shape: (20, 3)
┌─────┬─────┬─────┐
│ g   ┆ j   ┆ rn  │
│ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ u32 │
╞═════╪═════╪═════╡
│ 0   ┆ 92  ┆ 0   │
│ 0   ┆ 24  ┆ 1   │
│ 0   ┆ 45  ┆ 2   │
│ 1   ┆ 78  ┆ 0   │
│ …   ┆ …   ┆ …   │
│ 5   ┆ 68  ┆ 1   │
│ 5   ┆ 59  ┆ 2   │
│ 6   ┆ 38  ┆ 0   │
│ 6   ┆ 83  ┆ 1   │
└─────┴─────┴─────┘