Pyspark Array column to dataframe

i have this urgent problem, i need to transform an array column in pyspark dataframe to a dataframe itself.

Exemple:

Input:

number values combination
a [e, f, g] [[e, f],[e,g],[f,g]...]
b [e, f, g ,h] [[e, f],[e,g],[f,g],[f,h]...]
c [b, c] [[b, c]]

i want to get in output only the column combination as:

value1 value2
e f
e g
f g
e f
e g
f g
f h
b c

i want the extract line by line in the same dataframe without loop functions.

>Solution :

let’s say input dataframe is df.

from pyspark.sql import functions as F
df = df.select(F.explode(df.combination).alias("values"))
df = df.select(df.values[0].alias('value1'), df.values[1].alias('value2'))

Leave a Reply