Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Polars Case Statement

I am trying to pick up the package polars from Python.
I come from an R background so appreciate this might be an incredibly easy question.

I want to implement a case statement where if any of the conditions below are true, it will flag it to 1 otherwise it will be 0. My new column will be called ‘my_new_column_flag’

I am however getting the error message

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Traceback (most recent call last):
File "", line 2, in
File "C:\Users\foo\Miniconda3\envs\env\lib\site-packages\polars\internals\lazy_functions.py", line 204, in col
return pli.wrap_expr(pycol(name))
TypeError: argument ‘name’: ‘int’ object cannot be converted to ‘PyString’

import polars as pl
import numpy as np

np.random.seed(12)

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)
print(df)

df.with_column(
    pl.when(pl.col('nrs') == 1).then(pl.col(1))
    .when(pl.col('names') == 'ham').then(pl.col(1))
    .when(pl.col('random') == 0.014575).then(pl.col(1))
    .otherwise(pl.col(0))
    .alias('my_new_column_flag')
)

Can anyone help?

>Solution :

pl.col selects a column with the given name (as string). What you want is a column with literal value set to one: pl.lit(1)

df.with_columns(
    pl.when(pl.col('nrs') == 1).then(pl.lit(1))
    .when(pl.col('names') == 'ham').then(pl.lit(1))
    .when(pl.col('random') == 0.014575).then(pl.lit(1))
    .otherwise(pl.lit(0))
    .alias('my_new_column_flag')
)

PS: it may look more natural to use predicate for your flat (and cast it to int if you want it to be 0/1 instead of true/false):


df.with_columns(
    ((pl.col("nrs") == 1) | (pl.col("names") == "ham") | (pl.col("random") == 0.014575))
    .alias("my_new_column_flag")
    .cast(int)
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading