pyspark pivot row without aggrefation

Advertisements

I have Pyspark Dataframe named df as below,

I need to pivot the data based on ProducingMonth and classification column and need to produce the following output

I am using the following pyspark code

pivotDF = df.groupBy("WELL_ID","CLASSIFICATION").pivot("CLASSIFICATION")

while I am displaying the data I am getting error "’GroupedData’ object has no attribute ‘display’"

>Solution :

You need to perform the aggregation after.

from pyspark.sql import functions as F

pivotDF = df.groupBy("WELL_ID","producing_month").pivot("CLASSIFICATION").agg(
   F.first("OIL"),
   F.first("GAS"),
)

Then you can probably use display pivotDF.display()

Leave a ReplyCancel reply