Home Multiple array except in PySpark

Questions

Multiple array except in PySpark

June 10, 2022

I have a Spark dataframe:

> numbers_df
+----+-----------+-----------+-----------+-------------------------------------+
| id |      num_1|      num_2|      num_3|                              all_num|
+----+-----------+-----------+-----------+-------------------------------------+
|   1|  [1, 2, 5]|     [4, 7]|     [8, 3]|          [1, 2, 3, 4, 5, 6, 7, 8, 9]|
|   2|   [12, 13]|   [10, 20]|   [15, 17]| [10, 11, 12, 13, 14, 15, 16, 17, 18]|
+----+-----------+-----------+-----------+-------------------------------------+

I need to except from column all_num values of num_1, num_2 and num_3 columns.
Expected result:

id	num_1	num_2	num_3	all_num	except_num
1	[1, 2, 5]	[4, 7]	[8, 3]	[1, 2, 3, 4, 5, 6, 7, 8, 9]	[6, 9]
2	[12, 13]	[10, 16]	[15, 17]	[10, 11, 12, 13, 14, 15, 16, 17, 18]	[11, 14, 18]

How can this be done in PySpark? Since array_except function takes only two columns as input

>Solution :

You can combine array_except and concat functions.

df = df.withColumn('except_num', F.array_except('all_num', F.concat('num_1', 'num_2', 'num_3')))
df.show(truncate=False)

apache-spark-sql

byMR

Published June 10, 2022

Add a comment

Arrow Function Created In An Object Returns undefined When Called Using .call() Method

byMR

June 10, 2022

Questions

What is the equivalent function in Oracle for IF function in MySQL?

byMR

June 10, 2022

Questions

How to split a string by character sets that are different in python

byMR

June 10, 2022

Questions

Error: Required named parameter 'vinNumber' must be provided

byMR

June 10, 2022

Questions

The const assertion seems that the type can't be assigned to the type any[] for the generic parameter

byMR

June 10, 2022

Questions

SQL LEFT JOIN with two tables – table order and performance

byMR

June 10, 2022

Multiple array except in PySpark

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Arrow Function Created In An Object Returns undefined When Called Using .call() Method

What is the equivalent function in Oracle for IF function in MySQL?

How to split a string by character sets that are different in python

Error: Required named parameter 'vinNumber' must be provided

The const assertion seems that the type can't be assigned to the type any[] for the generic parameter

SQL LEFT JOIN with two tables – table order and performance

Keep Up to Date with the Most Important News

Multiple array except in PySpark

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Arrow Function Created In An Object Returns undefined When Called Using .call() Method

What is the equivalent function in Oracle for IF function in MySQL?

How to split a string by character sets that are different in python

Error: Required named parameter 'vinNumber' must be provided

The const assertion seems that the type can't be assigned to the type any[] for the generic parameter

SQL LEFT JOIN with two tables – table order and performance

Discover more from Dev solutions