Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Return all rows that have at least one null in one of the columns using Polars

I need all the rows that have null in one of the predefined columns.
I basically need this but i have one more requirement that I cant seem to figure out.
Not every column needs to be checked.

I have a function that returns the names of the columns that need to be checked in a list.

Assume this is my dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data = pl.from_repr("""
┌───────┬───────┬─────┬───────┐
│ a     ┆ b     ┆ c   ┆ d     │
│ ---   ┆ ---   ┆ --- ┆ ---   │
│ str   ┆ str   ┆ str ┆ bool  │
╞═══════╪═══════╪═════╪═══════╡
│ abc   ┆ null  ┆ u   ┆ true  │
│ def   ┆ abc   ┆ v   ┆ true  │
│ ghi   ┆ def   ┆ null┆ true  │
│ jkl   ┆ uvw   ┆ x   ┆ true  │
│ mno   ┆ xyz   ┆ y   ┆ null  │
│ qrs   ┆ null  ┆ z   ┆ null  │
└───────┴───────┴─────┴───────┘
""")

Doing
data.filter(polars.any_horizontal(polars.all().is_null()))
gives me all rows where any of the columns contain null.

Sometimes it’s fine for column c to contain a null so let’s not check it.

what I want is this:

┌───────┬───────┬─────┬───────┐
│ a     ┆ b     ┆ c   ┆ d     │
│ ---   ┆ ---   ┆ --- ┆ ---   │
│ str   ┆ str   ┆ str ┆ bool  │
╞═══════╪═══════╪═════╪═══════╡
│ abc   ┆ null  ┆ u   ┆ true  │
│ mno   ┆ xyz   ┆ y   ┆ null  │
│ qrs   ┆ null  ┆ z   ┆ null  │
└───────┴───────┴─────┴───────┘

Row 3 is not shown even though there is a null value in column c.

columns = ["a", "b", "d"]
data.filter(polars.any_horizontal(polars.all(*columns).is_null()))

This gives me
polars.exceptions.SchemaError: invalid series dtype: expected 'Boolean', got 'str'

I thought maybe the columns aren’t aligned or somethig because data has more columns than what the filter uses, so i did this.

columns = ["a", "b", "d"]
# notice `.select(columns)` here
data.select(columns).filter(polars.any_horizontal(polars.all(*columns).is_null()))

But is still get the same error.
How do I get the full rows of data that contain a null in one of ["a", "b", "d"] columns

>Solution :

If you want exclude some columns you can use .exlude():

import polars as pl

data.filter(pl.any_horizontal(pl.exclude("c").is_null()))
┌─────┬──────┬─────┬──────┐
│ a   ┆ b    ┆ c   ┆ d    │
│ --- ┆ ---  ┆ --- ┆ ---  │
│ str ┆ str  ┆ str ┆ bool │
╞═════╪══════╪═════╪══════╡
│ abc ┆ null ┆ u   ┆ true │
│ mno ┆ xyz  ┆ y   ┆ null │
│ qrs ┆ null ┆ z   ┆ null │
└─────┴──────┴─────┴──────┘

Or you can just use column names by using .col():

import polars as pl

cols = ["a","b","d"]

data.filter(pl.any_horizontal(pl.col(cols).is_null()))
shape: (3, 4)
┌─────┬──────┬─────┬──────┐
│ a   ┆ b    ┆ c   ┆ d    │
│ --- ┆ ---  ┆ --- ┆ ---  │
│ str ┆ str  ┆ str ┆ bool │
╞═════╪══════╪═════╪══════╡
│ abc ┆ null ┆ u   ┆ true │
│ mno ┆ xyz  ┆ y   ┆ null │
│ qrs ┆ null ┆ z   ┆ null │
└─────┴──────┴─────┴──────┘

If you want to be really flexible, you can use selectors, for example .selectors.exclude():

import polars.selectors as cs

data.filter(pl.any_horizontal(cs.exclude("c").is_null()))
shape: (3, 4)
┌─────┬──────┬─────┬──────┐
│ a   ┆ b    ┆ c   ┆ d    │
│ --- ┆ ---  ┆ --- ┆ ---  │
│ str ┆ str  ┆ str ┆ bool │
╞═════╪══════╪═════╪══════╡
│ abc ┆ null ┆ u   ┆ true │
│ mno ┆ xyz  ┆ y   ┆ null │
│ qrs ┆ null ┆ z   ┆ null │
└─────┴──────┴─────┴──────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading