Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pyspark table to pandas dataframe

I have an object type <class 'pyspark.sql.dataframe.DataFrame'> and I want to convert it to Pandas DataFRame. But the dataset is too big and I just need some columns, thus I selected the ones I want with the following:

df = spark.table("sandbox.zitrhr023")
columns= ['X', 'Y', 'Z', 'etc']

and then:

df_new= df.select(*columns).show()

but it returns a NoneType object. When I try the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_new = df_new.toPandas()

It gives the following error:

AttributeError: 'NoneType' object has no attribute 'toPandas'

Do I need to put df_new in a spark dataframe before converting it with toPandas()? How do I do that?

>Solution :

You are trying to cast it to Pandas Dataframe after calling show which print the Dataframe and return None, can you try the following

df_new= df.select.(*columns).toPandas()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading