Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Polars Modify Many Columns Based On Value In Another Column

Say I have a DataFrame that looks like this:

df = pl.DataFrame({
  "id": [1, 2, 3, 4, 5],
  "feature_a": np.random.randint(0, 3, 5),
  "feature_b": np.random.randint(0, 3, 5),
  "label": [1, 0, 0, 1, 1],
})

┌─────┬───────────┬───────────┬───────┐
│ id  ┆ feature_a ┆ feature_b ┆ label │
│ --- ┆ ---       ┆ ---       ┆ ---   │
│ i64 ┆ i64       ┆ i64       ┆ i64   │
╞═════╪═══════════╪═══════════╪═══════╡
│ 1   ┆ 2         ┆ 0         ┆ 1     │
│ 2   ┆ 1         ┆ 1         ┆ 0     │
│ 3   ┆ 2         ┆ 2         ┆ 0     │
│ 4   ┆ 1         ┆ 0         ┆ 1     │
│ 5   ┆ 0         ┆ 0         ┆ 1     │
└─────┴───────────┴───────────┴───────┘

I want to modify all the features columns based on the value in the label column, producing a new DataFrame.

┌─────┬───────────┬───────────┐
│ id  ┆ feature_a ┆ feature_b │
│ --- ┆ ---       ┆ ---       │
│ i64 ┆ i64       ┆ i64       │
╞═════╪═══════════╪═══════════╡
│ 1   ┆ 1         ┆ 1         │
│ 2   ┆ 0         ┆ 0         │
│ 3   ┆ 0         ┆ 0         │
│ 4   ┆ 1         ┆ 1         │
│ 5   ┆ 1         ┆ 1         │
└─────┴───────────┴───────────┘

I know I can select all the features columns by using regex in the column selector

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

pl.col(r"^feature_.*$")

And I can use a when/then expression to evaluate the label column

pl.when(pl.col("label") == 1).then(1).otherwise(0)

But I can’t seem to put the 2 together to modify all the selected columns in one fell swoop. It seems so simple, what am I missing?

>Solution :

Here’s one way:

Recently support was added for more ergonomic arguments in a lot of methods, including with_columns and select. Since they now can take any number of keyword arguments acting like an alias at the end (e.g. setting the new column name), we can construct a dict of the columns to overwrite and pass it in (with unpacking) like so:

df.select('id', **{col : 'label' for col in df.columns if col.startswith('feature')})

In this simple case no when/then is needed for the label column, but in general any expression evaluating to a column of the same height as id can go into this dict comprehension.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading