I have a pyspark dataframe and a separate list of column names. I want to check and see if any of the list column names are missing, and if they are, I want to create them and fill with null values.
Is there a straightforward way to do this in pyspark? I can do it in Pandas, but it’s not what I need.
>Solution :
This should work:
if 'col' not in df.schema.names:
df = df.withColumn('col', F.lit(None).cast(StringType())
Let me know if you face any issue.