Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Compute the number of unique combinations while excluding those containing missing values

I’d like to count the number of unique values when combining several columns at once. My idea so far was to use pl.struct(...).n_unique(), which works fine when I consider missing values as a unique value:

import polars as pl

df = pl.DataFrame({
    "x": ["a", "a", "b", "b"],
    "y": [1, 1, 2, None],
})

df.with_columns(foo=pl.struct("x", "y").n_unique())
shape: (4, 3)
┌─────┬──────┬─────┐
│ x   ┆ y    ┆ foo │
│ --- ┆ ---  ┆ --- │
│ str ┆ i64  ┆ u32 │
╞═════╪══════╪═════╡
│ a   ┆ 1    ┆ 3   │
│ a   ┆ 1    ┆ 3   │
│ b   ┆ 2    ┆ 3   │
│ b   ┆ null ┆ 3   │
└─────┴──────┴─────┘

However, sometimes I want to exclude a combination from the count if it contains any number of missing values. In the example above, I’d like foo to be 2. However, using .drop_nulls() before counting doesn’t work and produces the same output as above.

df.with_columns(foo=pl.struct("x", "y").drop_nulls().n_unique())

Is there a way to do this using only Polars expressions?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

pl.Expr.drop_nulls does not drop the row as the entirety of the struct is indeed not null.

To still achieve the desired result, you can filter out all rows which contain a null values in any of the columns of interest using pl.Expr.filter.

(
    df
    .with_columns(
        foo=pl.struct("x", "y").filter(
            ~pl.any_horizontal(pl.col("x", "y").is_null())
        ).n_unique()
    )
)
shape: (4, 3)
┌─────┬──────┬─────┐
│ x   ┆ y    ┆ foo │
│ --- ┆ ---  ┆ --- │
│ str ┆ i64  ┆ u32 │
╞═════╪══════╪═════╡
│ a   ┆ 1    ┆ 2   │
│ a   ┆ 1    ┆ 2   │
│ b   ┆ 2    ┆ 2   │
│ b   ┆ null ┆ 2   │
└─────┴──────┴─────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading