Home PySpark remove double brackets after collect_set of list

Questions

PySpark remove double brackets after collect_set of list

October 5, 2022

I want to remove the double brackets after collect_set ?

Input data :

DF = [('1',  '[132]'),
      ('1',  '[184, 88]'),
      ('2',  '[55]'),
      ('2',  '[123,33]'),]

DF = spark.sparkContext.parallelize(DF).toDF(['id', 'codes'])

DF.groupBy("id").agg(F.collect_set("codes").alias("codes_concat")).show(4)

+---+------------------+
| id|      codes_concat|
+---+------------------+
|  1|[[184, 88], [132]]|
|  2|  [[123,33], [55]]|
+---+------------------+

How do I get a simple list instead:

+---+------------------+
| id|      codes_concat|
+---+------------------+
|  1|  [184, 88, 132]  |
|  2|  [123,33, 55]    |
+---+------------------+

>Solution :

You can use the translate function to remove the [ and ] first, and then use the collect_set function to aggregate.

DF.groupBy("id").agg(F.collect_set(F.translate("codes", "[]", "")).alias("codes_concat")).show(4)

merge

byMR

Published October 05, 2022

Add a comment

how can i use golang nested struct i tried everything but i get error?

byMR

October 5, 2022

Questions

How to add one blank after period

byMR

October 5, 2022

Questions

Loop map with entity

byMR

October 5, 2022

Questions

sum multiple list elements at the same time(python)

byMR

October 5, 2022

Questions

Substitute numbers in a list of type object pandas

byMR

October 5, 2022

Questions

The ID column has multiple values in the name column

byMR

October 5, 2022

PySpark remove double brackets after collect_set of list

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

how can i use golang nested struct i tried everything but i get error?

How to add one blank after period

Loop map with entity

sum multiple list elements at the same time(python)

Substitute numbers in a list of type object pandas

The ID column has multiple values in the name column

Keep Up to Date with the Most Important News

PySpark remove double brackets after collect_set of list

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

how can i use golang nested struct i tried everything but i get error?

How to add one blank after period

Loop map with entity

sum multiple list elements at the same time(python)

Substitute numbers in a list of type object pandas

The ID column has multiple values in the name column

Discover more from Dev solutions