i have this urgent problem, i need to transform an array column in pyspark dataframe to a dataframe itself.
Exemple:
Input:
| number | values | combination |
|---|---|---|
| a | [e, f, g] | [[e, f],[e,g],[f,g]...] |
| b | [e, f, g ,h] | [[e, f],[e,g],[f,g],[f,h]...] |
| c | [b, c] | [[b, c]] |
i want to get in output only the column combination as:
| value1 | value2 |
|---|---|
| e | f |
| e | g |
| f | g |
| e | f |
| e | g |
| f | g |
| f | h |
| b | c |
i want the extract line by line in the same dataframe without loop functions.
>Solution :
let’s say input dataframe is df.
from pyspark.sql import functions as F
df = df.select(F.explode(df.combination).alias("values"))
df = df.select(df.values[0].alias('value1'), df.values[1].alias('value2'))