Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Polars map users with more than one event

df = pl.LazyFrame({'col1':['a','a','b','b','c','c'],'col2':['undefined','defined','defined','defined','undefined','undefined']})

I have the following DF

Imagine I want to select the rows where col2 has more than two different values when it is grouped by col1.
And I want to select those users and map them in a way I know they have more than two different values in col 2.

Wanted result

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pl.LazyFrame({'col1':['a','a','b','b','c','c'],'col2':['undefined','defined','defined','defined','undefined','undefined'],'col3':[2,2,1,1,1,1]})

Any ideas on how can I achieve that? In a efficient manner avoiding lefts joins separate filters and so on

>Solution :

It looks like .n_unique()

df.with_columns(col3 = pl.col("col2").n_unique().over("col1"))
shape: (6, 3)
┌──────┬───────────┬──────┐
│ col1 ┆ col2      ┆ col3 │
│ ---  ┆ ---       ┆ ---  │
│ str  ┆ str       ┆ u32  │
╞══════╪═══════════╪══════╡
│ a    ┆ undefined ┆ 2    │
│ a    ┆ defined   ┆ 2    │
│ b    ┆ defined   ┆ 1    │
│ b    ┆ defined   ┆ 1    │
│ c    ┆ undefined ┆ 1    │
│ c    ┆ undefined ┆ 1    │
└──────┴───────────┴──────┘

You may want to also try .unique().len() as there was a previous report n_unique about performance: https://github.com/pola-rs/polars/issues/11249

Not sure if those benchmarks are still relevant.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading