Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to select columns and cast column types in a pyspark dataframe?

I have a very large pyspark dataframe in which I need to select a lot of columns (which is why I want to use a for instead of writing each column name). The majority of those columns I need to cast them to DoubleType(), except for one column that I need to keep as a StringType() (column "ID").

When I’m selecting all the columns that I need to cast to DoubleType() I use this code (it works) :

df_num2 = df_num1.select([col(c).cast(DoubleType()) for c in num_columns])

How can I also select my column "ID" which is a StringType() ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

List concatenation in python :

df_num2 = df_num1.select(["id"] + [col(c).cast(DoubleType()) for c in num_columns])

# OR

df_num2 = df_num1.select(["id", *(col(c).cast(DoubleType()) for c in num_columns)])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading