Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

getting unusual error when creating a string schema type dataframe

I am creating a simple data frame.

df=spark.createDataFrame(data=[('11s1 ab')],schema=['str'])

I get error:

TypeError: Can not infer schema for type: <class ‘str’>

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

However if I change the statement to :

df=spark.createDataFrame(data=[('11s1 ab',)],schema=['str'])

my dataframe is successfully created.

I want to understand why that comma sign matters in data definition tuple in spark.createdataFrame.

>Solution :

In the document of createDataFrame you can see the data field must be:

data: Union[pyspark.rdd.RDD[Any], Iterable[Any], ForwardRef('PandasDataFrameLike')]

(1,) or [1] are iterable but (1) would be integer type which is not iterable

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading