Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill NaN values in Polars using a custom-defined function for a specific column

I have this code in pandas:

df[col] = (
            df[col]
            .fillna(method="ffill", limit=1)
            .apply(lambda x: my_function(x))
        )

I want to re-write this in Polars.

I have tried this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = df.with_columns(
            pl.col(col)
            .fill_null(strategy="forward", limit=1)
            .apply(lambda x: my_function(x))
        )

It does not work properly. It fills with forward strategy but ignores filling missing values with my defined function. What should I change in my code to get what I want?

try this code:

df_polars = pl.DataFrame(
    {"A": [1, 2, None, None, None, None, 4, None], "B": [5, None, None, None, None, 7, None, 9]}
)

df_pandas = pd.DataFrame(
    {"A": [1, 2, None, None, None, None, 4, None], "B": [5, None, None, None, None, 7, None, 9]}
)

last_valid_data: int


def my_function(x):
    global last_valid_data
    if x == None or np.isnan(x):
        result = last_valid_data * 10
    else:
        last_valid_data = x
        result = x
    return result


col = "A"

last_valid_data = df_pandas[col][0]
df_pandas[col] = df_pandas[col].fillna(method="ffill", limit=1).apply(lambda x: my_function(x))

last_valid_data = df_polars[col][0]
df_polars = df_polars.with_columns(
    pl.col(col).fill_null(strategy="forward", limit=1).apply(lambda x: my_function(x))
)

Desired output in pandas is:

      A    B
0   1.0  5.0
1   2.0  NaN
2   2.0  NaN
3  20.0  NaN
4  20.0  NaN
5  20.0  7.0
6   4.0  NaN
7   4.0  9.0

What I get in Polars is:

┌──────┬──────┐
│ A    ┆ B    │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ 5    │
│ 2    ┆ null │
│ 2    ┆ null │
│ null ┆ null │
│ null ┆ null │
│ null ┆ 7    │
│ 4    ┆ null │
│ 4    ┆ 9    │
└──────┴──────┘

>Solution :

The issue here is that in Polars .apply defaults to skip_nulls=True

df_polars.with_columns(
   pl.col('A').apply(lambda me: print(f'{me=}'))
)
me=1
me=2
me=4

As your example specifically needs to target the nulls, you need to change this to False

df_polars.with_columns(
   pl.col('A').apply(lambda me: print(f'{me=}'), skip_nulls=False)
)
me=1
me=2
me=None
me=None
me=None
me=None
me=4
me=None
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading