I have infinite values in a polars dataframe (np.inf and -np.inf) and would like to drop those rows.
I am aware of a drop_nans and drop_null methods but I don’t see a drop_inf.
I can’t just replace inf with nan and then call drop_nans as I want to handle nan values separately.
What’s an idiomatic way of dropping rows with infinite values?
>Solution :
you can use DataFrame.filter() and Expr.is_infinite() to filter out rows you don’t need:
import numpy as np
import polars as pl
df = pl.DataFrame({
"a": [1, 2, 3, np.inf],
"b": [1, 2, 3, 4]
})
df.filter(~pl.col('a').is_infinite())
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ f64 ┆ i64 │
╞═════╪═════╡
│ 1.0 ┆ 1 │
│ 2.0 ┆ 2 │
│ 3.0 ┆ 3 │
└─────┴─────┘
Or, if you want to check all columns for infinite values, you can use .any_horizontal():
import numpy as np
import polars as pl
df = pl.DataFrame({
"a": [1, 2, 3, np.inf],
"b": [1, 2, -np.inf, 4]
})
df.filter(
~pl.any_horizontal(pl.all().is_infinite())
)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
│ 2.0 ┆ 2.0 │
└─────┴─────┘
If you have non-numeric columns you can also use selectors:
import numpy as np
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"a": [1, 2, 3, np.inf],
"b": [1, 2, -np.inf, 4],
"c": list("abcd")
})
df.filter(
~pl.any_horizontal(cs.numeric().is_infinite())
)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1.0 ┆ 1.0 ┆ a │
│ 2.0 ┆ 2.0 ┆ b │
└─────┴─────┴─────┘