Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I reduce the amount of data in a polars DataFrame?

I have a csv file with a size of 28 GB, which I want to plot. Those are way too many data points obviously, so how can I reduce the data? I would like to merge about 1000 data points into one by calculating the mean. This is the sturcture of my DataFrame:

Time in seconds Force in N
f64 f64
0.0 2310.18
0.0005 2313.23
0.001 2314.14

I thought about using groupby_dynamic, and then calculating the mean of each group, but this only seems to work when using datetimes? The time in seconds is given as a float however.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can also group by an integer column to create groups of size N:

In case of a groupby_dynamic on an integer column, the windows are defined by:

“1i” # length 1

“10i” # length 10

We can use .int_range() to add an integer row count to group on:

df = pl.DataFrame({"force": ["A", "B", "C", "D", "E", "F", "G"]})

(df.with_columns(row_nr = pl.int_range(0, pl.count()))
   .groupby_dynamic(
      index_column = "row_nr",
      every = "2i" 
   )
   .agg("force")
)
shape: (4, 2)
┌────────┬────────────┐
│ row_nr ┆ force      │
│ ---    ┆ ---        │
│ i64    ┆ list[str]  │
╞════════╪════════════╡
│ 0      ┆ ["A", "B"] │
│ 2      ┆ ["C", "D"] │
│ 4      ┆ ["E", "F"] │
│ 6      ┆ ["G"]      │
└────────┴────────────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading