Home Passing varying length variables to a PySpark groupby().agg function

Questions

Passing varying length variables to a PySpark groupby().agg function

March 4, 2022

I am passing lists of column names of varying lengths to the PySpark’s groupby().agg function? The code I have written checks the length of the list and for example, if it is length 1, it will do a .agg(count) on the one element. If the list is of length 2, it will do two separate .agg(counts) producing two new .agg columns.

Is there a more succinct way to write this than through an if statement because as the lists of column names become longer I’ll have to add more elif statements.

For example:

agg_fields: list of column names

if len(agg_fields) == 1:
    df = df.groupBy(col1, col2).agg(count(agg_fields[0]))

elif len(agg_fields) == 2:
    df = df.groupBy(col1, col2).agg(count(agg_fields[0]), \
                                    count(agg_fields[1]))

>Solution :

Yes, you can simply loop to create your aggregate statement:

agg_df = df.groupBy("col1","col2").agg(*[count(i).alias(i) for i in agg_fields])

pandas-groupby

byMR

Published March 04, 2022

Add a comment

What is the distribution of math.random in Java?

byMR

March 4, 2022

Questions

Python prints ascii code + character instead of colored text

byMR

March 4, 2022

Questions

How to generate getters and setters in Javascript?

byMR

March 4, 2022

Questions

Unhandled Exception: type 'Future<dynamic>' is not a subtype of type 'int'

byMR

March 4, 2022

Questions

Don't Accept the number that start with zero using jquery validation

byMR

March 4, 2022

Questions

Regex to capture entire string in python panda series

byMR

March 4, 2022

Passing varying length variables to a PySpark groupby().agg function

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

What is the distribution of math.random in Java?

Python prints ascii code + character instead of colored text

How to generate getters and setters in Javascript?

Unhandled Exception: type 'Future<dynamic>' is not a subtype of type 'int'

Don't Accept the number that start with zero using jquery validation

Regex to capture entire string in python panda series

Keep Up to Date with the Most Important News

Passing varying length variables to a PySpark groupby().agg function

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

What is the distribution of math.random in Java?

Python prints ascii code + character instead of colored text

How to generate getters and setters in Javascript?

Unhandled Exception: type 'Future<dynamic>' is not a subtype of type 'int'

Don't Accept the number that start with zero using jquery validation

Regex to capture entire string in python panda series

Discover more from Dev solutions