I am trying to create a new column in Polars data frame based on comparison of two existing columns:
import polars as pl
data = {"a": [2, 30], "b": [20, 3]}
df = pl.DataFrame(data)
df
Out[4]:
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2 ┆ 20 │
│ 30 ┆ 3 │
└─────┴─────┘
When I do:
df.with_columns(pl.map(["a", "b"], lambda s: "+" if s[0] < s[1] else "-").alias("strand"))
I am getting an error:
thread '<unnamed>' panicked at 'python apply failed: The truth value of a Series is ambiguous. Hint: use '&' or '|' to chain Series boolean results together, not and/or; to check if a Series contains any values, use 'is_empty()'', src/lazy/apply.rs:185:19
I am able to create a boolean column:
df.with_columns(pl.map(["a", "b"], lambda s: s[0] < s[1] ).alias("strand"))
so with extra steps I should get the column with the desired "+" and "-", but is there some simpler way?
Thank you for your help
DK
>Solution :
You can use polars expressions e.g. when/then/otherwise
df.with_columns(
pl.when(pl.col("a") < pl.col("b")).then("+").otherwise("-")
.alias("strand")
)
shape: (2, 3)
┌─────┬─────┬────────┐
│ a | b | strand │
│ --- | --- | --- │
│ i64 | i64 | str │
╞═════╪═════╪════════╡
│ 2 | 20 | + │
│ 30 | 3 | - │
└─────┴─────┴────────┘
or .map_dict
df.with_columns(
(pl.col("a") < pl.col("b"))
.map_dict({True: "+", False: "-"})
.alias("strand")
)
shape: (2, 3)
┌─────┬─────┬────────┐
│ a | b | strand │
│ --- | --- | --- │
│ i64 | i64 | str │
╞═════╪═════╪════════╡
│ 2 | 20 | + │
│ 30 | 3 | - │
└─────┴─────┴────────┘