Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I fill_null on a struct column?

I am trying to compare two dataframes via dfcompare = (df0 == df1) and nulls are never considered identical (unlike join there is no option to allow nulls to match).

My approach with other fields is to fill them in with an "empty value" appropriate to their datatype. What should I use for structs?

import polars as pl

df = pl.DataFrame(
    {
        "int": [1, 2, None],
        "data" : [dict(a=1,b="b"),dict(a=11,b="bb"),None]
    }
)

df.describe()
print(df)

df2 = df.with_columns(pl.col("int").fill_null(0))

df2.describe()
print(df2)

# these error out:...
try:
    df3 = df2.with_columns(pl.col("data").fill_null(dict(a=0,b="")))
except (Exception,) as e: 
    print("try#1", e)


try:
    df3 = df2.with_columns(pl.col("data").fill_null(pl.struct(dict(a=0,b=""))))
except (Exception,) as e: 
    print("try#2", e)

Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


shape: (3, 2)
┌──────┬─────────────┐
│ int  ┆ data        │
│ ---  ┆ ---         │
│ i64  ┆ struct[2]   │
╞══════╪═════════════╡
│ 1    ┆ {1,"b"}     │
│ 2    ┆ {11,"bb"}   │
│ null ┆ {null,null} │
└──────┴─────────────┘
shape: (3, 2)
┌─────┬─────────────┐
│ int ┆ data        │
│ --- ┆ ---         │
│ i64 ┆ struct[2]   │
╞═════╪═════════════╡
│ 1   ┆ {1,"b"}     │
│ 2   ┆ {11,"bb"}   │
│ 0   ┆ {null,null} │
└─────┴─────────────┘
try#1 invalid literal value: "{'a': 0, 'b': ''}"
try#2 a

Error originated just after this operation:
DF ["int", "data"]; PROJECT */2 COLUMNS; SELECTION: "None"

My, satisfactory, workaround has been to unnest the columns instead. This works fine (even better as it allow subfield-by-subfield fills). Still, I remain curious about how to achieve a suitable "struct literal" that can be passed into these types of functions.

One can also imagine wanting to add a hardcoded column as in df4 = df.with_columns(pl.lit("0").alias("zerocol"))

>Solution :

A struct literal to use in the context of pl.Expr.fill_null can be created with pl.struct as follows.

df.with_columns(
    pl.col("data").fill_null(
        pl.struct(a=pl.lit(1), b=pl.lit("MISSING"))
    )
)
shape: (3, 2)
┌──────┬───────────────┐
│ int  ┆ data          │
│ ---  ┆ ---           │
│ i64  ┆ struct[2]     │
╞══════╪═══════════════╡
│ 1    ┆ {1,"b"}       │
│ 2    ┆ {11,"bb"}     │
│ null ┆ {1,"MISSING"} │
└──────┴───────────────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading