I would like to add a column that is the sum of all columns but some id columns with polars. This can be done using polars.DataFrame.sum(axis=1):
import polars as pl
df = pl.DataFrame(
{
"id": [1, 2],
"cat_a": [2, 7],
"cat_b": [5, 1],
"cat_c": [0, 3]
}
)
df["cat_total"] = df.select(pl.all().exclude("id")).sum(axis=1)
df
However, this really feels like pandas style. I would prefer to be able to have this inside a longer sequence of calls inside a select or with_column call:
# Throws TypeError: sum() got an unexpected keyword argument 'axis'
# because polars.Expr.sum does not support choosing an axis
(df
# [...]
.with_column(pl.all().exclude("id").sum(axis=1).alias("cat_total"))
# [...]
)
How can this be done (without explicitly identifying the column names)?
>Solution :
You can use a fold expression, which takes an accumulator: acc, a binary function Fn(acc, Series) -> Series and one or more expression to apply the fold on.
df.with_column(
pl.fold(0, lambda acc, s: acc + s, pl.all().exclude("id")).alias("horizontal_sum")
)
This would output:
shape: (2, 5)
┌─────┬───────┬───────┬───────┬────────────────┐
│ id ┆ cat_a ┆ cat_b ┆ cat_c ┆ horizontal_sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═══════╪═══════╪═══════╪════════════════╡
│ 1 ┆ 2 ┆ 5 ┆ 0 ┆ 7 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ 1 ┆ 3 ┆ 11 │
└─────┴───────┴───────┴───────┴────────────────┘