Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

when / then / otherwise with values from numpy array

Say I have

df = pl.DataFrame({'group': [1, 1, 1, 3, 3, 3, 4, 4]})

I have a numpy array of values, which I’d like to replace 'group' 3 with

values = np.array([9, 8, 7])

Here’s what I’ve tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(
    df
    .with_column(
        pl.when(pl.col('group')==3)
        .then(values)
        .otherwise(pl.col('group')
    ).alias('group')
)
In [4]: df.with_column(pl.when(pl.col('group')==3).then(values).otherwise(pl.col('group')).alias('group'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [4], line 1
----> 1 df.with_column(pl.when(pl.col('group')==3).then(values).otherwise(pl.col('group')).alias('group'))

File ~/tmp/.venv/lib/python3.8/site-packages/polars/internals/whenthen.py:132, in When.then(self, expr)
    111 def then(
    112     self,
    113     expr: (
   (...)
    121     ),
    122 ) -> WhenThen:
    123     """
    124     Values to return in case of the predicate being `True`.
    125
   (...)
    130
    131     """
--> 132     expr = pli.expr_to_lit_or_expr(expr)
    133     pywhenthen = self._pywhen.then(expr._pyexpr)
    134     return WhenThen(pywhenthen)

File ~/tmp/.venv/lib/python3.8/site-packages/polars/internals/expr/expr.py:118, in expr_to_lit_or_expr(expr, str_to_lit)
    116     return expr.otherwise(None)
    117 else:
--> 118     raise ValueError(
    119         f"did not expect value {expr} of type {type(expr)}, maybe disambiguate with"
    120         " pl.lit or pl.col"
    121     )

ValueError: did not expect value [9 8 7] of type <class 'numpy.ndarray'>, maybe disambiguate with pl.lit or pl.col

How can I do this correctly?

>Solution :

A few things to consider.

  • One is that you always should convert your numpy arrays to polars Series as we will use the arrow memory specification underneath and not numpys.

  • Second is that when -> then -> otherwise operates on columns that are of equal length. We nudge the API in such a direction that you define a logical statement based of columns in your DataFrame and therefore you should not know the indices (nor the lenght of a group) that you want to replace. This allows for much optimizations because if you do not define indices to replace we can push down a filter before that expression.

Anyway, your specific situation does know the length of the group, so we must use something different. We can first compute the indices where the conditional holds and then modify based on those indices.

df = pl.DataFrame({
    "group": [1, 1, 1, 3, 3, 3, 4, 4]
})

values = np.array([9, 8, 7])

# compute indices of the predicate
idx = df.select(
    pl.arg_where(pl.col("group") == 3)
).to_series()

# mutate on those locations
df.with_column(
    df["group"].set_at_idx(idx, pl.Series(values))
)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading