Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Avoid clone when re-using a Rust Polars DataFrame?

I am trying to implement a bias-corrected accelerated confidence interval in Rust. My metric function takes in a Rust dataframe and does some operations on it to return an f64. In the example below, obviously .lazy() is not needed, but the real function does require it (it does group_bys, etc.). To do a bCa confidence interval, one step is to calculate the metric on the original sample. The second step is to do a jacknife, calculating the metric on the sample with the ith row deleted. The issue is, without the .clone() on the first step, Rust complains about "borrow of moved value". If I change metric to take a reference, then I either have to clone within the function or dereference within the function, or I get "cannot move out of a shared reference". Is it possible to avoid this clone, or is the clone very cheap and I shouldn’t worry about it?

use polars::prelude::*;
use rayon::iter::{IntoParallelIterator, ParallelIterator};

fn metric(df: DataFrame) -> f64 {
    df.lazy().collect().unwrap()["x"].sum().unwrap()
}

pub fn bca_confidence_interval(df: DataFrame) -> (f64, f64, f64) {
    let df_height = df.height();
    let stat_original = metric(df.clone());

    let index = ChunkedArray::new("index", 0..df_height as u64);
    let jacknife_stats: Vec<f64> = (0..df_height)
        .into_par_iter()
        .map(|i| metric(df.filter(&index.not_equal(i)).unwrap()))
        .filter(|x| !x.is_nan())
        .collect();

    (0.0, 1.0, 2.0)
}

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The .clone() is pretty cheap. I wouldn’t worry about it.

A DataFrame just holds a Vec<Series> for its columns and Series just holds an Arc<_>. Arcs facilitate shared ownership so a clone just increments a counter. So there is no massive deep-copy when cloning the dataframe; the data is shared.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading