Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding a column based on condition in Polars

Let’s say I have a Polars dataframe like so:

df = pl.DataFrame({
    'a': [0.3, 0.7, 0.5, 0.1, 0.9]
})

And now I need to add a new column where 1 or 0 is assigned depending on whether a value in column 'a' is greater or less than some threshold. In Pandas I can do this:

import numpy as np

THRESHOLD = 0.5
df['new'] = np.where(df.a > THRESHOLD, 0, 1)

I can also do something very similar in Polars:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = df.with_columns(
    pl.lit(np.where(df.select('a').to_numpy() > THRESHOLD, 0, 1).ravel())
    .alias('new')
)

This works fine but I’m sure that using NumPy here is not the best practice.

I’ve also tried something more like:

df = df.with_columns(
    pl.lit(df.filter(pl.col('a') > THRESHOLD).select([0, 1]))
    .alias('new')
)

But with this syntax I keep running into the following error:

DuplicateError                            Traceback (most recent call last)
Cell In[47], line 5
      1 THRESHOLD = 0.5
      2 DELAY_TOLERANCE = 10
      4 df = df.with_columns(
----> 5     pl.lit(df.filter(pl.col('a') > THRESHOLD).select([0, 1]))
      6     .alias('new')
      7 )
      8 df.head()

DuplicateError: column with name 'literal' has more than one occurrences

So my question is two-fold: what am I doing wrong here and what is the best practice in Polars for such conditional assignments?

I did looks through docs and previous questions but couldn’t find anything resembling my use-case.

>Solution :

The select([0, 1]) doesn’t really make a lot of sense Polars-wise, you’re just selecting a literal. Not quite sure why that’s throwing a DuplicateError as is.

Conditionals in polars are best done with when:

df.with_columns(pl.when(pl.col("a") > 0.5).then(0).otherwise(1).alias("b"))

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading