Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

convert a pyspark dataframe column in databricks as a list without using rdd

I trying to collect the values of a pyspark dataframe column in databricks as a list.

When I use the collect function

df.select('col_name').collect()

, I get a list with extra values.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

based on some searches, using .rdd.flatmap() will do the trick

However, for some security reasons (it says rdd is not whitelisted), I cannot perform or use rdd. Could there be another way to collect a column value as a list?

>Solution :

if you have a small dataframe, say you only have one column, I would suggest converting it to pandas dataframe and use tolist() function.

pdf = df.toPandas()
pdf_list = pdf['col_name'].tolist()

your output should be something like below:

['value1','value2','value3']

hope that helps

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading