Pyspark – Transpose

April 6, 2023

In Pyspark have dataset as below

+-----------+-----------+                                                       
|weekend_day|totals     |
+-----------+-----------+
| 2023-02-25|  401943676|
| 2023-03-11|  410220150|
+-----------+-----------+

and the expected output is

 -----------------------------------
|        | 2023-02-25 | 2023-03-11 |
| totals | 401943676  | 410220150  |

pivot is not providing the result. Please advice how it can be achieved?

Please note I don’t want to use Pandas

Thank you

>Solution :

Not sure what do you mean of pivot is not providing the result?

df = spark.createDataFrame(
    [('2023-02-25', 401943676), ('2023-03-11', 410220150)],
    schema=['weekend_day', 'totals']
)
df.printSchema()
df.show(3, False)
+-----------+---------+
|weekend_day|totals   |
+-----------+---------+
|2023-02-25 |401943676|
|2023-03-11 |410220150|
+-----------+---------+

You can use groupBy and pivot to achieve the expected output:
from pyspark.sql import functions as func

df.groupBy(
    func.lit('total').alias('col_name')
).pivot(
    'weekend_day'
).agg(
    func.first('totals')
).show(
    10, False
)
+--------+----------+----------+
|col_name|2023-02-25|2023-03-11|
+--------+----------+----------+
|total   |401943676 |410220150 |
+--------+----------+----------+