Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a Spark dataframe with thousands of columns and then add a column of ArrayType that hold them all

I’d like to create a dataframe in Spark with Scala code like this:

col_1 col_2 col_3 .. col_2048
0.123 0.234 0.323
0.345 0.456 0.534

Then add an extra column of ArrayType to it, that holds all these 2048 columns data in one column:

col_1 col_2 col_3 .. col_2048 array_col
0.123 0.234 0.323 [0,123, 0.234, …, 0.323]
0.345 0.456 0.534 [0.345, 0.456, …, 0.534]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

try this

df.withColumn("array_col",array(df.columns.map(col): _*)).show
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading