Check if columns exist and if not, create and fill with NaN using PySpark

I have a pyspark dataframe and a separate list of column names. I want to check and see if any of the list column names are missing, and if they are, I want to create them and fill with null values.

Is there a straightforward way to do this in pyspark? I can do it in Pandas, but it’s not what I need.

>Solution :

This should work:

 if 'col' not in df.schema.names:
    df = df.withColumn('col', F.lit(None).cast(StringType())

Let me know if you face any issue.

Leave a Reply