Home Add a column to a struct nested in an array

Questions

Add a column to a struct nested in an array

March 31, 2022

I have a PySpark DataFrame with an array of structs, containing two columns (colorcode and name). I want to add a new column to the struct, newcol.

This question answered "how to add a column to a nested struct", but I’m failing to transfer it to my case, where the struct is further nested inside an array. I can’t seem to reference/recreate the array-struct schema.

My schema:

 |-- Id: string (nullable = true)
 |-- values: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Dep: long (nullable = true)
 |    |    |-- ABC: string (nullable = true)

What is should become:

 |-- Id: string (nullable = true)
 |-- values: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Dep: long (nullable = true)
 |    |    |-- ABC: string (nullable = true)
 |    |    |-- newcol: string (nullable = true)

How do I transfer the solution to my nested struct?

Reproducible code to get a df of the above schema:

data = [
    (10, [{"Dep": 10, "ABC": 1}, {"Dep": 10, "ABC": 1}]),
    (20, [{"Dep": 20, "ABC": 1}, {"Dep": 20, "ABC": 1}]),
    (30, [{"Dep": 30, "ABC": 1}, {"Dep": 30, "ABC": 1}]),
    (40, [{"Dep": 40, "ABC": 1}, {"Dep": 40, "ABC": 1}])
  ]
myschema = StructType(
[
    StructField("id", IntegerType(), True),
    StructField("values",
                ArrayType(
                    StructType([
                        StructField("Dep", StringType(), True),
                        StructField("ABC", StringType(), True)
                    ])
    ))
]
)
df = spark.createDataFrame(data=data, schema=myschema)
df.printSchema()
df.show(10, False)

>Solution :

For spark version >= 3.1, you can use the transform function and withField method to achieve this.

df = df.withColumn('values', F.transform('values', lambda x: x.withField('newcol', F.lit(1))))

apache-spark-sql

byMR

Published March 31, 2022

Add a comment

Javascript change single object key name and return arrays

byMR

March 31, 2022

Questions

React text state doesnt update immediately

byMR

March 31, 2022

Questions

Not able to access storage even if storage permission is granted in flutter

byMR

March 31, 2022

Questions

Add json response to a list

byMR

March 31, 2022

Questions

html button click event

byMR

March 31, 2022

Questions

Header has 2 rows on the mobile

byMR

March 31, 2022

Add a column to a struct nested in an array

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Javascript change single object key name and return arrays

React text state doesnt update immediately

Not able to access storage even if storage permission is granted in flutter

Add json response to a list

html button click event

Header has 2 rows on the mobile

Keep Up to Date with the Most Important News

Add a column to a struct nested in an array

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Javascript change single object key name and return arrays

React text state doesnt update immediately

Not able to access storage even if storage permission is granted in flutter

Add json response to a list

html button click event

Header has 2 rows on the mobile

Discover more from Dev solutions