Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply Python dict to Polars column (using replace_strict)

I have a dict and a polars DataFrame and want to map one column to the values of the dict:

import polars as pl

df = pl.DataFrame({
    'label': ['AA', 'BB', 'AA', 'CC'],
    'type':  ['CASH', 'ITEM', 'CHECK', 'CHECK'],
})

mapping = {
    'CASH':  {'qty':  1, 'origin': 'E'},
    'ITEM':  {'qty': -9, 'origin': 'A'},
    'CHECK': {'qty': 46, 'origin': 'A'},
}
df.with_columns(pl.col('type').replace_strict(mapping).alias('mapped'))

This outputs

shape: (4, 3)
┌───────┬───────┬───────────┐
│ label ┆ type  ┆ mapped    │
│ ---   ┆ ---   ┆ ---       │
│ str   ┆ str   ┆ struct[2] │
╞═══════╪═══════╪═══════════╡
│ AA    ┆ CASH  ┆ {1,"E"}   │
│ BB    ┆ ITEM  ┆ {-9,"A"}  │
│ AA    ┆ CHECK ┆ {46,"A"}  │
│ CC    ┆ CHECK ┆ {46,"A"}  │
└───────┴───────┴───────────┘

The problem is, it only takes the values of the dict and entirely drops the keys.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So I tried using replace_strict(mapping, return_dtype=pl.Object), but this gives error

File site-packages\polars\lazyframe\frame.py:2026,
     in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown,
                          simplify_expression, slice_pushdown, comm_subplan_elim, 
                          comm_subexpr_elim, cluster_with_columns, no_optimization,
                          streaming, engine, background, _eager, **_kwargs)
   2024 # Only for testing purposes
   2025 callback = _kwargs.get("post_opt_callback", callback)
-> 2026 return wrap_df(ldf.collect(callback))

InvalidOperationError: casting from Int64 to Unknown not supported

Ultimately, the output I am after is the table below. How do I achieve this?

shape: (4, 3)
┌───────┬───────┬─────────────────────────┐
│ label ┆ type  ┆ mapped                  │
│ ---   ┆ ---   ┆ ---                     │
│ str   ┆ str   ┆ object                  │
╞═══════╪═══════╪═════════════════════════╡
│ AA    ┆ CASH  ┆ {"qty":1,"origin":"E"}  │
│ BB    ┆ ITEM  ┆ {"qty":-9,"origin":"A"} │
│ AA    ┆ CHECK ┆ {"qty":46,"origin":"A"} │
│ CC    ┆ CHECK ┆ {"qty":46,"origin":"A"} │
└───────┴───────┴─────────────────────────┘

I am using polars==1.3.0

>Solution :

Just to be clear, polars doesn’t drop the keys, it just implicitly converts python dictionaries to Struct

You can easily check it by using, for example, unnest():

res = df.with_columns(pl.col('type').replace_strict(mapping).alias('mapped'))

┌───────┬───────┬───────────┐
│ label ┆ type  ┆ mapped    │
│ ---   ┆ ---   ┆ ---       │
│ str   ┆ str   ┆ struct[2] │
╞═══════╪═══════╪═══════════╡
│ AA    ┆ CASH  ┆ {1,"E"}   │
│ BB    ┆ ITEM  ┆ {-9,"A"}  │
│ AA    ┆ CHECK ┆ {46,"A"}  │
│ CC    ┆ CHECK ┆ {46,"A"}  │
└───────┴───────┴───────────┘

res.unnest('mapped')

┌───────┬───────┬─────┬────────┐
│ label ┆ type  ┆ qty ┆ origin │
│ ---   ┆ ---   ┆ --- ┆ ---    │
│ str   ┆ str   ┆ i64 ┆ str    │
╞═══════╪═══════╪═════╪════════╡
│ AA    ┆ CASH  ┆ 1   ┆ E      │
│ BB    ┆ ITEM  ┆ -9  ┆ A      │
│ AA    ┆ CHECK ┆ 46  ┆ A      │
│ CC    ┆ CHECK ┆ 46  ┆ A      │
└───────┴───────┴─────┴────────┘

All the key names are there, you just don’t see it in the text representation. So question is – do you actually need data in the column to be python dictionaries? It’d be much easier to work with native polars struct.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading