I have a dictionary with strings as keys an polars expresions as values.
How can I do something like this in a concise way:
df = df.with_columns(
pl.when(condition_1)
.then(pl.lit(key_1))
.when(pl.lit(condition_2))
.then(pl.lit(key_2))
...
.otherwise(None)
.alias("new_column")
)
>Solution :
Consider the following example data.
import polars as pl
df = pl.DataFrame({
"num": list(range(6)),
})
shape: (6, 1)
┌─────┐
│ num │
│ --- │
│ i64 │
╞═════╡
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
└─────┘
In general, pl.when().then().otherwise() constructs can be nested to obtain the effect of a switch statement, which you seem to outline in your question.
df.with_columns(
pl.when(
pl.col("num") < 2
).then(
pl.lit("small")
).otherwise(
pl.when(
pl.col("num") > 3
).then(
pl.lit("large")
).otherwise(
pl.lit("medium")
)
)
)
shape: (6, 2)
┌─────┬─────────┐
│ num ┆ literal │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════════╡
│ 0 ┆ small │
│ 1 ┆ small │
│ 2 ┆ medium │
│ 3 ┆ medium │
│ 4 ┆ large │
│ 5 ┆ large │
└─────┴─────────┘
This can be tedious if many conditions are nested. In this case, pl.coalesce might help together with the fact that a pl.when().then() construct evaluates to null if the condition in pl.when() is not satisfied.
df.with_columns(
pl.coalesce(
pl.when(pl.col("num") < 2).then(pl.lit("small")),
pl.when(pl.col("num") > 3).then(pl.lit("large")),
pl.lit("medium")
)
)
shape: (6, 2)
┌─────┬─────────┐
│ num ┆ literal │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════════╡
│ 0 ┆ small │
│ 1 ┆ small │
│ 2 ┆ medium │
│ 3 ┆ medium │
│ 4 ┆ large │
│ 5 ┆ large │
└─────┴─────────┘
If you have a dictionary with values as keys and the conditions as values, this might be used as follows.
d = {
"small": pl.col("num") < 2,
"medium": pl.col("num") < 4,
"large": pl.col("num") >= 4,
}
df.with_columns(
pl.coalesce(
pl.when(cond).then(pl.lit(val)) for val, cond in d.items()
)
)