Polars equivalent of Pandas groupby transform

byMR

November 21, 2022

‘Is there a way to replicate the groupby -> transform functionality in Polars?

I use this for normalising over groups (make groups sum to one) i.e.

df['normalised'] = df.groupby[*groupcols*].transform(lambda x: x/x.sum())

which I did a work around in polars by adding a column for the sum then dividing by that column:

df.join((df.groupby(by=[*groupcols*]).agg((pl.col('VOL').sum()).alias('VOLSUM'))),
        left_on = [*groupcols*],
        right_on = [*groupcols*]).with_column(
        (pl.col('VOL') / pl.col('VOLSUM')).alias('VOLNORM')
    ).drop('VOLSUM')

However, another one I need is cummulative sum within groups:

df['cummulativesum'] = df.groupby[*groupcols*][col].transform('cumsum')

2 questions for Polars people: is my normalisation over groups the best way? is there a way to do a polars equivalent of cumsum within groups in polars??

Thanks in advance!

>Solution :

You can use Window Expressions

DataFrame:

>>> df = pl.DataFrame(dict(GROUP=[1, 2, 2, 1, 3], VOL=[1, 2, 1, 5, 1]))
>>> df
shape: (5, 2)
┌───────┬─────┐
│ GROUP | VOL │
│ ---   | --- │
│ i64   | i64 │
╞═══════╪═════╡
│ 1     | 1   │
├───────┼─────┤
│ 2     | 2   │
├───────┼─────┤
│ 2     | 1   │
├───────┼─────┤
│ 1     | 5   │
├───────┼─────┤
│ 3     | 1   │
└─//────┴─//──┘

.with_columns() and .over():

>>> df.with_columns([
...    (pl.col("VOL") / pl.col("VOL").sum()).over("GROUP").alias("VOLNORM"),
...    pl.col("VOL").cumsum().over("GROUP").alias("VOLCUMSUM")
... ])
shape: (5, 4)
┌───────┬─────┬──────────┬───────────┐
│ GROUP | VOL | VOLNORM  | VOLCUMSUM │
│ ---   | --- | ---      | ---       │
│ i64   | i64 | f64      | i64       │
╞═══════╪═════╪══════════╪═══════════╡
│ 1     | 1   | 0.166667 | 1         │
├───────┼─────┼──────────┼───────────┤
│ 2     | 2   | 0.666667 | 2         │
├───────┼─────┼──────────┼───────────┤
│ 2     | 1   | 0.333333 | 3         │
├───────┼─────┼──────────┼───────────┤
│ 1     | 5   | 0.833333 | 6         │
├───────┼─────┼──────────┼───────────┤
│ 3     | 1   | 1.0      | 1         │
└─//────┴─//──┴─//───────┴─//────────┘