I have this code:
import polars as pl
cols = ['Delta', 'Qty']
metrics = {'CHECK.US': {'Delta': {'ABC': 1, 'DEF': 2}, 'Qty': {'GHIJ': 3, 'TT': 4}},
'CHECK.NA': {},
'CHECK.FR': {'Delta': {'QQQ': 7, 'ABC': 6}, 'Qty': {'SS': 9, 'TT': 5}}
}
df = pl.DataFrame([{col: v.get(col) for col in cols} for v in metrics.values()])\
.insert_column(0, pl.Series('key', metrics.keys()))\
.with_columns([pl.col(col).name.map_fields(lambda x: f'{col} ({x})') for col in cols])
Now, df.unnest('Qty') correctly gives all columns formatted as Qty (xxx):
shape: (3, 5)
ββββββββββββ¬βββββββββββββ¬βββββββββββββ¬βββββββββββ¬βββββββββββ
β key β Delta β Qty (GHIJ) β Qty (TT) β Qty (SS) β
β --- β --- β --- β --- β --- β
β str β struct[3] β i64 β i64 β i64 β
ββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββͺβββββββββββ‘
β CHECK.US β {1,2,null} β 3 β 4 β null β
β CHECK.NA β null β null β null β null β
β CHECK.FR β {6,null,7} β null β 5 β 9 β
ββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββ΄βββββββββββ
However, when I do the same thing for df.unnest('Delta') it incorrectly returns columns with Qty (xxx):
shape: (3, 5)
ββββββββββββ¬ββββββββββββ¬ββββββββββββ¬ββββββββββββ¬βββββββββββββ
β key β Qty (ABC) β Qty (DEF) β Qty (QQQ) β Qty β
β --- β --- β --- β --- β --- β
β str β i64 β i64 β i64 β struct[3] β
ββββββββββββͺββββββββββββͺββββββββββββͺββββββββββββͺβββββββββββββ‘
β CHECK.US β 1 β 2 β null β {3,4,null} β
β CHECK.NA β null β null β null β null β
β CHECK.FR β 6 β null β 7 β {null,5,9} β
ββββββββββββ΄ββββββββββββ΄ββββββββββββ΄ββββββββββββ΄βββββββββββββ
The values look correct, just the column names are wrong.
Am I using pl.col(col).name.map_field(...) incorrectly? How can I fix my code so that the output becomes this:
shape: (3, 5)
ββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬βββββββββββββ
β key β Delta (ABC) β Delta (DEF) β Delta (QQQ) β Qty β
β --- β --- β --- β --- β --- β
β str β i64 β i64 β i64 β struct[3] β
ββββββββββββͺββββββββββββββͺββββββββββββββͺββββββββββββββͺβββββββββββββ‘
?
>Solution :
It’s a general Python "Gotcha" with regards to lambdas inside loops.
col is being set to the last value in the loop.
The workaround is to use named params.
pl.col(col).name.map_fields(lambda x, col=col: f'{col} ({x})') for col in cols
# ^^^^^^^