I have a csv file with a size of 28 GB, which I want to plot. Those are way too many data points obviously, so how can I reduce the data? I would like to merge about 1000 data points into one by calculating the mean. This is the sturcture of my DataFrame:
| Time in seconds | Force in N |
|---|---|
| f64 | f64 |
| 0.0 | 2310.18 |
| 0.0005 | 2313.23 |
| 0.001 | 2314.14 |
I thought about using groupby_dynamic, and then calculating the mean of each group, but this only seems to work when using datetimes? The time in seconds is given as a float however.
>Solution :
You can also group by an integer column to create groups of size N:
In case of a
groupby_dynamicon an integer column, the windows are defined by:
“1i”# length 1
“10i”# length 10
We can use .int_range() to add an integer row count to group on:
df = pl.DataFrame({"force": ["A", "B", "C", "D", "E", "F", "G"]})
(df.with_columns(row_nr = pl.int_range(0, pl.count()))
.groupby_dynamic(
index_column = "row_nr",
every = "2i"
)
.agg("force")
)
shape: (4, 2)
┌────────┬────────────┐
│ row_nr ┆ force │
│ --- ┆ --- │
│ i64 ┆ list[str] │
╞════════╪════════════╡
│ 0 ┆ ["A", "B"] │
│ 2 ┆ ["C", "D"] │
│ 4 ┆ ["E", "F"] │
│ 6 ┆ ["G"] │
└────────┴────────────┘