Home Dataframe – Find sum of all values from dictionary column (row-wise) and then create new column for that Sum

Questions

Dataframe – Find sum of all values from dictionary column (row-wise) and then create new column for that Sum

June 17, 2022

My pyspark Dataframe which has two columns, ID and count, count column is a dict/Map<str,int>. I want to create another column which is the total of all values of count

I have

ID                        count
3004000304    {'A' -> 2, 'B' -> 4, 'C -> 5, 'D' -> 1, 'E' -> 9}
3004002756    {'B' -> 3, 'A' -> 8,'D' -> 3, 'C' -> 8, 'E' -> 1}

I want something like, Sum of all the values of count column

ID                        count                                      total_value
3004000304    {'A' -> 2, 'B' -> 4, 'C -> 5, 'D' -> 1, 'E' -> 9}       21 
3004002756    {'B' -> 3, 'A' -> 8,'D' -> 3, 'C' -> 8, 'E' -> 1}       23

My approach

from pyspark.sql import functions as F
df.select(explode("count")).groupBy("key").sum("value").rdd.collectAsMap()

But I am getting grouped by individual Key and then aggregating which is incorrect.

If it is not possible in Pyspark, is it possible to convert to pandas df and then do it? Any help is much appreciated

>Solution :

Use the aggregate function to accumulate the map_values.

df = df.withColumn('total_value', F.expr('aggregate(map_values(count), 0 , (acc, x) -> acc + int(x))'))
df.show(truncate=False)

pyspark

byMR

Published June 17, 2022

Add a comment

how to convert a items in a object to list of array

byMR

June 17, 2022

Questions

How to change a date format in mysql

byMR

June 17, 2022

Questions

Javascript: how to replace

byMR

June 17, 2022

Questions

automatic calculation of the calendar week in JavaScript

byMR

June 17, 2022

Questions

I'm getting a SIGSEV signal when running a program on HackerRank

byMR

June 17, 2022

Questions

Filtering in query or filter with language functions?

byMR

June 17, 2022

Dataframe – Find sum of all values from dictionary column (row-wise) and then create new column for that Sum

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

how to convert a items in a object to list of array

How to change a date format in mysql

Javascript: how to replace

automatic calculation of the calendar week in JavaScript

I'm getting a SIGSEV signal when running a program on HackerRank

Filtering in query or filter with language functions?

Keep Up to Date with the Most Important News

Dataframe – Find sum of all values from dictionary column (row-wise) and then create new column for that Sum

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

how to convert a items in a object to list of array

How to change a date format in mysql

Javascript: how to replace

automatic calculation of the calendar week in JavaScript

I'm getting a SIGSEV signal when running a program on HackerRank

Filtering in query or filter with language functions?

Discover more from Dev solutions