I am creating a simple data frame.
df=spark.createDataFrame(data=[('11s1 ab')],schema=['str'])
I get error:
TypeError: Can not infer schema for type: <class ‘str’>
However if I change the statement to :
df=spark.createDataFrame(data=[('11s1 ab',)],schema=['str'])
my dataframe is successfully created.
I want to understand why that comma sign matters in data definition tuple in spark.createdataFrame.
>Solution :
In the document of createDataFrame you can see the data field must be:
data: Union[pyspark.rdd.RDD[Any], Iterable[Any], ForwardRef('PandasDataFrameLike')]
(1,) or [1] are iterable but (1) would be integer type which is not iterable