How to select columns and cast column types in a pyspark dataframe?


I have a very large pyspark dataframe in which I need to select a lot of columns (which is why I want to use a for instead of writing each column name). The majority of those columns I need to cast them to DoubleType(), except for one column that I need to keep as a StringType() (column "ID").

When I’m selecting all the columns that I need to cast to DoubleType() I use this code (it works) :

df_num2 =[col(c).cast(DoubleType()) for c in num_columns])

How can I also select my column "ID" which is a StringType() ?

>Solution :

List concatenation in python :

df_num2 =["id"] + [col(c).cast(DoubleType()) for c in num_columns])

# OR

df_num2 =["id", *(col(c).cast(DoubleType()) for c in num_columns)])

Leave a ReplyCancel reply