Home Properly groupby and filter with Polar

Questions

Properly groupby and filter with Polar

December 20, 2022

I have df for my work with 3 main columns: cid1, cid2, cid3, and more 7 columns cid4, cid5, etc.

cid1 and cid2 is int, another columns is float.

Each combitations of cid1 and cid2 is a workset with some rows where is values of all other columns is different. I want to filter df and receive my df with only max values in column cid3 for each combination of cid1 and cid2. cid4 and next columns must be leaved without changes.

This code helps me with one part of my task:

df = (df
    .groupby(["cid1", "cid2"])
    .agg([pl.max("cid3").alias("max_cid3")])
)

It’s receives only 3 columns: cid1, cid2, max_cid3 and filter all rows when cid3 is not maximal.
But I can’t find how to receive all another columns (cid4, etc) for that rows without changes.

df = (df
    .groupby(["cid1", "cid2"])
    .agg([pl.max("cid3").alias("max_cid3"), pl.col("cid4")])
)

I tried to add pl.col("cid4") to list of aggs but in column I see as values different lists of some cid4 values.

How I can make it properly? Maybe Polars haves another way to make it then groupby?

In Pandas I can make it:

import pandas as pd
import numpy as np

df["max_cid3"] = df.groupby(['cid1', 'cid2'])['cid3'].transform(np.max)

And then filter df wherever cid3==max_cid3
But I can’t find a way to make it in Polars.

Thank you!

>Solution :

In polars you can use a Window function

df.with_column(
   pl.col("cid3").max().over(["cid1", "cid2"])
     .alias("max_cid3")
)
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────┬──────────┐
│ cid1 ┆ cid2 ┆ cid3 ┆ cid4 ┆ cid5 ┆ max_cid3 │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ i64  ┆ i64  ┆ i64  ┆ i64  ┆ i64      │
╞══════╪══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ 4    ┆ 4    ┆ 1        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2    ┆ 2    ┆ 2    ┆ 5    ┆ 5    ┆ 9        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2    ┆ 2    ┆ 9    ┆ 6    ┆ 4    ┆ 9        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1    ┆ 1    ┆ 1    ┆ 7    ┆ 9    ┆ 1        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3    ┆ 3    ┆ 1    ┆ 8    ┆ 3    ┆ 1        │
└──────┴──────┴──────┴──────┴──────┴──────────┘

You could also put it directly inside .filter()

df.filter(
    pl.col("cid3") == pl.col("cid3").max().over(["cid1", "cid2"])
)

Data used:

df = pl.DataFrame({
   "cid1": [1, 2, 2, 1, 3],
   "cid2": [1, 2, 2, 1, 3],
   "cid3": [1, 2, 9, 1, 1],
   "cid4": [4, 5, 6, 7, 8],
   "cid5": [4, 5, 4, 9, 3],
})

>>> df.to_pandas().groupby(["cid1", "cid2"])["cid3"].transform("max")
0    1
1    9
2    9
3    1
4    1
Name: cid3, dtype: int64

python-polars

byMR

Published December 20, 2022

Add a comment

What is clean ways to handle multiple events in a React component?

byMR

December 20, 2022

Questions

Combine multiple "equals … or"

byMR

December 20, 2022

Questions

Simple group bar chart with dual axis issue

byMR

December 20, 2022

Questions

Filtering array based on another array

byMR

December 20, 2022

Questions

Making a dataframe where new row is created after every nth column using only semi colons as delimiters

byMR

December 20, 2022

Questions

am tried to make a program where the elements of an array will appear at most twice. Its working for some cases but also failing for some

byMR

December 20, 2022

Properly groupby and filter with Polar

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

What is clean ways to handle multiple events in a React component?

Combine multiple "equals … or"

Simple group bar chart with dual axis issue

Filtering array based on another array

Making a dataframe where new row is created after every nth column using only semi colons as delimiters

am tried to make a program where the elements of an array will appear at most twice. Its working for some cases but also failing for some

Keep Up to Date with the Most Important News

Properly groupby and filter with Polar

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

What is clean ways to handle multiple events in a React component?

Combine multiple "equals … or"

Simple group bar chart with dual axis issue

Filtering array based on another array

Making a dataframe where new row is created after every nth column using only semi colons as delimiters

am tried to make a program where the elements of an array will appear at most twice. Its working for some cases but also failing for some

Discover more from Dev solutions