spark isNotNull() doesn't remove None values

I have a spark dataframe with a None value on first row.

enter image description here

I created the above dataframe initailly in pandas then converted to a spark dataframe:

df = pd.DataFrame(
        'rid': ['A', 'B', 'C'],
        'num': [None, 8, 9],
        'availability_percent': [56, 69, 70],
        'availability_spaces': [7, 6, 5]


df_spark = spark.createDataFrame(df)

When i do df_spark.filter(df_spark.num.isNotNull()).show()

i get the same dataframe above, meaning my row with Nan values was not remove. What i did wrong?

enter image description here

>Solution :

You can add a check for isNan to cover the case of NAN values

from pyspark.sql.functions import isnan

df_spark.filter(~isnan(df_spark.num) & df_spark.num.isNotNull()).show()

Leave a Reply