Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unpivot dataframe in Pyspark with new column

I would like to unpivot a dataframe that looks like this:

Col1 Col2 Val1 Val2
abc  def  12   75
ghi  jkl  67   86
...  ...  ..   ..

into something that will look like this:

Col1 Col2 NewCol Val
abc  def  KEY1   12
abc  def  KEY2   75
ghi  jkl  KEY1   67
ghi  jkl  KEY2   86
...  ...  ....   ..

I am quite new to python, but I know there is no unpivot function in pyspark.. any idea how I can achieve this? Thanks a lot!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Given the Dataframe you provided, one could use:

from pyspark.sql import functions as F
df.select(
  F.col("Col1"),
  F.col("Col2"),
  F.explode(
    F.map_from_arrays(
      F.array(F.lit("key1"), F.lit("key2")), 
      F.array(F.col("val1"), F.col("val2"))
    )
  )
)

As long as you maintain the order of keys and values, you should be fine

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading