I trying to collect the values of a pyspark dataframe column in databricks as a list.
When I use the collect function
, I get a list with extra values.
based on some searches, using .rdd.flatmap() will do the trick
However, for some security reasons (it says rdd is not whitelisted), I cannot perform or use rdd. Could there be another way to collect a column value as a list?
if you have a small dataframe, say you only have one column, I would suggest converting it to pandas dataframe and use
pdf = df.toPandas() pdf_list = pdf['col_name'].tolist()
your output should be something like below:
hope that helps