I have a spark dataframe with a None value on first row.
df_spark.show()
I created the above dataframe initailly in pandas then converted to a spark dataframe:
df = pd.DataFrame(
{
'rid': ['A', 'B', 'C'],
'num': [None, 8, 9],
'availability_percent': [56, 69, 70],
'availability_spaces': [7, 6, 5]
}
)
Then:
df_spark = spark.createDataFrame(df)
When i do df_spark.filter(df_spark.num.isNotNull()).show()
i get the same dataframe above, meaning my row with Nan values was not remove. What i did wrong?
>Solution :
You can add a check for isNan
to cover the case of NAN
values
from pyspark.sql.functions import isnan
df_spark.filter(~isnan(df_spark.num) & df_spark.num.isNotNull()).show()