Create duplicates of row based column values

February 14, 2023

I’m trying to build a histogram of some data in polars. As part of my histogram code, I need to duplicate some rows. I’ve got a column of values, where each row also has a weight that says how many times the row should be added to the histogram.

How can I duplicate my value rows according to the weight column?

Here is some example data, with a target series:

import polars as pl

df = pl.DataFrame({"value":[1,2,3], "weight":[2, 2, 1]})

print(df)
# shape: (3, 2)
# ┌───────┬────────┐
# │ value ┆ weight │
# │ ---   ┆ ---    │
# │ i64   ┆ i64    │
# ╞═══════╪════════╡
# │ 1     ┆ 2      │
# │ 2     ┆ 2      │
# │ 3     ┆ 1      │
# └───────┴────────┘

s_target = pl.Series(name="value", values=[1,1,2,2,3])
print(s_target)
# shape: (5,)
# Series: 'value' [i64]
# [
#   1
#   1
#   2
#   2
#   3
# ]

>Solution :

How about

(
    df.with_columns(
        pl.col("value").repeat_by(pl.col("weight"))
    )
    .select(pl.col("value").arr.explode())
)

In [11]: df.with_columns(pl.col('value').repeat_by(pl.col('weight'))).select(pl.col('value').arr.explode())
Out[11]:
shape: (5, 1)
┌───────┐
│ value │
│ ---   │
│ i64   │
╞═══════╡
│ 1     │
│ 1     │
│ 2     │
│ 2     │
│ 3     │
└───────┘

I didn’t know you could do this so easily, I only learned about it while writing the answer. Polars is so nice 🙂