‘Is there a way to replicate the groupby -> transform functionality in Polars?
I use this for normalising over groups (make groups sum to one) i.e.
df['normalised'] = df.groupby[*groupcols*].transform(lambda x: x/x.sum())
which I did a work around in polars by adding a column for the sum then dividing by that column:
df.join((df.groupby(by=[*groupcols*]).agg((pl.col('VOL').sum()).alias('VOLSUM'))),
left_on = [*groupcols*],
right_on = [*groupcols*]).with_column(
(pl.col('VOL') / pl.col('VOLSUM')).alias('VOLNORM')
).drop('VOLSUM')
However, another one I need is cummulative sum within groups:
df['cummulativesum'] = df.groupby[*groupcols*][col].transform('cumsum')
2 questions for Polars people: is my normalisation over groups the best way? is there a way to do a polars equivalent of cumsum within groups in polars??
Thanks in advance!
>Solution :
You can use Window Expressions
DataFrame:
>>> df = pl.DataFrame(dict(GROUP=[1, 2, 2, 1, 3], VOL=[1, 2, 1, 5, 1]))
>>> df
shape: (5, 2)
βββββββββ¬ββββββ
β GROUP | VOL β
β --- | --- β
β i64 | i64 β
βββββββββͺββββββ‘
β 1 | 1 β
βββββββββΌββββββ€
β 2 | 2 β
βββββββββΌββββββ€
β 2 | 1 β
βββββββββΌββββββ€
β 1 | 5 β
βββββββββΌββββββ€
β 3 | 1 β
ββ//βββββ΄β//βββ
.with_columns() and .over():
>>> df.with_columns([
... (pl.col("VOL") / pl.col("VOL").sum()).over("GROUP").alias("VOLNORM"),
... pl.col("VOL").cumsum().over("GROUP").alias("VOLCUMSUM")
... ])
shape: (5, 4)
βββββββββ¬ββββββ¬βββββββββββ¬ββββββββββββ
β GROUP | VOL | VOLNORM | VOLCUMSUM β
β --- | --- | --- | --- β
β i64 | i64 | f64 | i64 β
βββββββββͺββββββͺβββββββββββͺββββββββββββ‘
β 1 | 1 | 0.166667 | 1 β
βββββββββΌββββββΌβββββββββββΌββββββββββββ€
β 2 | 2 | 0.666667 | 2 β
βββββββββΌββββββΌβββββββββββΌββββββββββββ€
β 2 | 1 | 0.333333 | 3 β
βββββββββΌββββββΌβββββββββββΌββββββββββββ€
β 1 | 5 | 0.833333 | 6 β
βββββββββΌββββββΌβββββββββββΌββββββββββββ€
β 3 | 1 | 1.0 | 1 β
ββ//βββββ΄β//βββ΄β//ββββββββ΄β//βββββββββ