Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I replace `map_groups` with Vectorized solution in Polars?

I have a function vol_buckets() that has an internal function _vol_buckets_engine(). I want to find a better way to execute the same logic.

The data that the function uses has multiple symbols in the dataframe, and I need low_vol and mid_vol to be calc’ed independently forces each symbol. I do not want to use the map_groups() function as it is known to be slow. The purpose of _vol_buckets_engine() is to calculate the low_vol and mid_vol of each symbol from the realized_volatility column, by using the .quantile() method. And then to create a Polars expression vol_bucket, that creates a column and labels the row with either high, med, or low str to represent the volatility bucket that the row is in comparison to other rows for the same symbol.

I would like to achieve this functionality using expressions and .over("symbol") method. How would you suggest that I achieve this per symbol?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Data / Reprex / Functions

Data

data = pl.read_csv(
b"""
symbol,date,realized_volatility
dummy,2023-01-04T00:00:00.000000000,1.050947170034001
dummy,2023-01-05T00:00:00.000000000,1.0930296529563293
dummy,2023-01-06T00:00:00.000000000,5.0831401853504135
dummy,2023-01-09T00:00:00.000000000,3.8529778077664285
dummy,2023-01-10T00:00:00.000000000,3.1224700809152663
dummy,2023-01-11T00:00:00.000000000,3.328913046103626
dummy,2023-01-12T00:00:00.000000000,2.8538745003208548
dummy,2023-01-13T00:00:00.000000000,2.6237053380435693
dummy,2023-01-17T00:00:00.000000000,2.416355871776205
dummy,2023-01-18T00:00:00.000000000,2.2033129305712387
dummy,2023-01-19T00:00:00.000000000,2.0031931513812293
dummy,2023-01-20T00:00:00.000000000,2.1377752983292035
dummy,2023-01-23T00:00:00.000000000,2.3889134269559436
dummy,2023-01-24T00:00:00.000000000,2.2898536571943926
dummy,2023-01-25T00:00:00.000000000,2.1522017746630864
dummy,2023-01-26T00:00:00.000000000,2.153211000843634
dummy,2023-01-27T00:00:00.000000000,2.1344909490471045
dummy,2023-01-30T00:00:00.000000000,2.2444631484198787
dummy,2023-01-31T00:00:00.000000000,2.1693660366593583
dummy,2023-02-01T00:00:00.000000000,2.091741023814303
dummy,2023-02-02T00:00:00.000000000,2.622032115043669
dummy,2023-02-03T00:00:00.000000000,2.767740741501858
dummy,2023-02-06T00:00:00.000000000,2.790631908078487
dummy,2023-02-07T00:00:00.000000000,2.825755005215503
dummy,2023-02-08T00:00:00.000000000,2.839487798967736
dummy,2023-02-09T00:00:00.000000000,2.748677337185615
dummy,2023-02-10T00:00:00.000000000,2.6491272220932562
dummy,2023-02-13T00:00:00.000000000,2.678068235650906
dummy,2023-02-14T00:00:00.000000000,2.591777646393414
dummy,2023-02-15T00:00:00.000000000,2.621357463850524
dummy,2023-02-16T00:00:00.000000000,2.6205239182151603
dummy,2023-02-17T00:00:00.000000000,2.1897201502231676
dummy,2023-02-21T00:00:00.000000000,2.4364850847077117
dummy,2023-02-22T00:00:00.000000000,2.4324879545270295
dummy,2023-02-23T00:00:00.000000000,2.2858654833211633
dummy,2023-02-24T00:00:00.000000000,2.399940087252254
dummy,2023-02-27T00:00:00.000000000,2.388152400576198
dummy,2023-02-28T00:00:00.000000000,2.3660255638521663
dummy,2023-03-01T00:00:00.000000000,2.427234380419688
dummy,2023-03-02T00:00:00.000000000,2.4330829447444504
dummy,2023-03-03T00:00:00.000000000,2.718003833450704
dummy,2023-03-06T00:00:00.000000000,2.6476401801968548
dummy,2023-03-07T00:00:00.000000000,2.6860748665098315
dummy,2023-03-08T00:00:00.000000000,2.7021875940028997
dummy,2023-03-09T00:00:00.000000000,2.7054264661805933
dummy,2023-03-10T00:00:00.000000000,2.709577864255094
dummy,2023-03-13T00:00:00.000000000,2.6276349342884733
dummy,2023-03-14T00:00:00.000000000,2.6670200150686627
dummy,2023-03-15T00:00:00.000000000,2.6481374152619575
dummy,2023-03-16T00:00:00.000000000,2.3098956144797222
dummy,2023-03-17T00:00:00.000000000,2.1191230404273558
dummy,2023-03-20T00:00:00.000000000,2.0864454171790796
dummy,2023-03-21T00:00:00.000000000,2.0097633983245315
dummy,2023-03-22T00:00:00.000000000,1.929233413402791
dummy,2023-03-23T00:00:00.000000000,1.9295479237168838
dummy,2023-03-24T00:00:00.000000000,1.9509333112340799
dummy,2023-03-27T00:00:00.000000000,1.8844254026075675
dummy,2023-03-28T00:00:00.000000000,1.8839124342657283
dummy,2023-03-29T00:00:00.000000000,1.951087131997698
dummy,2023-03-30T00:00:00.000000000,1.9461090376651762
dummy,2023-03-31T00:00:00.000000000,2.009117720680126
dummy,2023-04-03T00:00:00.000000000,1.7772642996364656
dummy,2023-04-04T00:00:00.000000000,1.7780422055031615
dummy,2023-04-05T00:00:00.000000000,1.8187341926990974
dummy,2023-04-06T00:00:00.000000000,1.7150029341723647
dummy,2023-04-10T00:00:00.000000000,1.781541227195909
dummy,2023-04-11T00:00:00.000000000,1.7974543772914964
dummy,2023-04-12T00:00:00.000000000,1.7330193896599626
dummy,2023-04-13T00:00:00.000000000,2.1147528127542894
dummy,2023-04-14T00:00:00.000000000,1.706628052331882
dummy,2023-04-17T00:00:00.000000000,1.5907012619921879
dummy,2023-04-18T00:00:00.000000000,1.5371944764215826
dummy,2023-04-19T00:00:00.000000000,1.5298371806238649
dummy,2023-04-20T00:00:00.000000000,1.463565758594709
dummy,2023-04-21T00:00:00.000000000,1.42981031912061
dummy,2023-04-24T00:00:00.000000000,1.37111916315023
dummy,2023-04-25T00:00:00.000000000,1.3344651780937993
dummy,2023-04-26T00:00:00.000000000,1.3320778436462895
dummy,2023-04-27T00:00:00.000000000,1.4850652068750114
dummy,2023-04-28T00:00:00.000000000,1.494247444270745
dummy,2023-05-01T00:00:00.000000000,1.4134287701935544
dummy,2023-05-02T00:00:00.000000000,1.3780025740294264
dummy,2023-05-03T00:00:00.000000000,1.3639257736538914
dummy,2023-05-04T00:00:00.000000000,1.3812923208247097
dummy,2023-05-05T00:00:00.000000000,2.0832373775225053
dummy,2023-05-08T00:00:00.000000000,2.0305097315900373
dummy,2023-05-09T00:00:00.000000000,2.059561989940993
dummy,2023-05-10T00:00:00.000000000,1.963606664476099
dummy,2023-05-11T00:00:00.000000000,1.9306937107888855
dummy,2023-05-12T00:00:00.000000000,1.8582842726888125
dummy,2023-05-15T00:00:00.000000000,1.8406958433531144
dummy,2023-05-16T00:00:00.000000000,1.8370033720288037
dummy,2023-05-17T00:00:00.000000000,1.7970463374858805
dummy,2023-05-18T00:00:00.000000000,1.8500710447283415
dummy,2023-05-19T00:00:00.000000000,1.7605430717260266
dummy,2023-05-22T00:00:00.000000000,1.7511317900402594
dummy,2023-05-23T00:00:00.000000000,1.824326390923716
dummy,2023-05-24T00:00:00.000000000,1.4375350280115315
dummy,2023-05-25T00:00:00.000000000,1.4514367372766908
dummy,2023-05-26T00:00:00.000000000,1.519056219066445
dummy,2023-05-30T00:00:00.000000000,1.5382320775472667
dummy,2023-05-31T00:00:00.000000000,1.5217430737640685
dummy,2023-06-01T00:00:00.000000000,1.5972064596652586
dummy,2023-06-02T00:00:00.000000000,1.5714237778785558
dummy,2023-06-05T00:00:00.000000000,1.5902709727458448
dummy,2023-06-06T00:00:00.000000000,1.560745761259242
dummy,2023-06-07T00:00:00.000000000,1.5818161453753425
dummy,2023-06-08T00:00:00.000000000,1.3922006051979485
dummy,2023-06-09T00:00:00.000000000,1.3744562490572472
dummy,2023-06-12T00:00:00.000000000,1.457457768146953
dummy,2023-06-13T00:00:00.000000000,1.4466466918178154
dummy,2023-06-14T00:00:00.000000000,1.4361967395694815
dummy,2023-06-15T00:00:00.000000000,1.4447040609797785
dummy,2023-06-16T00:00:00.000000000,0.7312811461557279
dummy,2023-06-20T00:00:00.000000000,0.7313067277718319
dummy,2023-06-21T00:00:00.000000000,0.7077559307772997
dummy,2023-06-22T00:00:00.000000000,0.7628610294768818
dummy,2023-06-23T00:00:00.000000000,0.7634005996581734
dummy,2023-06-26T00:00:00.000000000,0.7731223634233964
dummy,2023-06-27T00:00:00.000000000,0.8472880682212705
dummy,2023-06-28T00:00:00.000000000,0.8611165955077903
dummy,2023-06-29T00:00:00.000000000,0.8576752702726677
dummy,2023-06-30T00:00:00.000000000,0.9746587880908129
dummy,2023-07-03T00:00:00.000000000,0.9957492823420077
dummy,2023-07-05T00:00:00.000000000,0.997091096401469
dummy,2023-07-06T00:00:00.000000000,0.9194450450911804
dummy,2023-07-07T00:00:00.000000000,0.930719343608125
dummy,2023-07-10T00:00:00.000000000,0.9562659005884814
dummy,2023-07-11T00:00:00.000000000,0.8913934198663264
dummy,2023-07-12T00:00:00.000000000,0.8802192596228992
dummy,2023-07-13T00:00:00.000000000,0.8858906512045663
dummy,2023-07-14T00:00:00.000000000,0.7989043724135594
dummy,2023-07-17T00:00:00.000000000,0.892709516255626
dummy,2023-07-18T00:00:00.000000000,0.8732598147105561
dummy,2023-07-19T00:00:00.000000000,0.8888680151111504
dummy,2023-07-20T00:00:00.000000000,0.9032313961664961
dummy,2023-07-21T00:00:00.000000000,0.8350230361133806
dummy,2023-07-24T00:00:00.000000000,0.8394175070677063
dummy,2023-07-25T00:00:00.000000000,0.7633667480526932
dummy,2023-07-26T00:00:00.000000000,0.7679755142359856
dummy,2023-07-27T00:00:00.000000000,0.77867043188138
dummy,2023-07-28T00:00:00.000000000,0.7980766465969299
dummy,2023-07-31T00:00:00.000000000,0.789544454683515
dummy,2023-08-01T00:00:00.000000000,0.7958438502081496
dummy,2023-08-02T00:00:00.000000000,0.8689450706760995
dummy,2023-08-03T00:00:00.000000000,0.7953045457007718
dummy,2023-08-04T00:00:00.000000000,1.6305627295436527
dummy,2023-08-07T00:00:00.000000000,1.7145261459249155
dummy,2023-08-08T00:00:00.000000000,1.6469841461269166
dummy,2023-08-09T00:00:00.000000000,1.6609267664512968
dummy,2023-08-10T00:00:00.000000000,1.6604023328434614
dummy,2023-08-11T00:00:00.000000000,1.4799646239001687
dummy,2023-08-14T00:00:00.000000000,1.4890337553207493
dummy,2023-08-15T00:00:00.000000000,1.5211673127036998
dummy,2023-08-16T00:00:00.000000000,1.5275524984305846
dummy,2023-08-17T00:00:00.000000000,1.5893564019504338
dummy,2023-08-18T00:00:00.000000000,1.5510882876065588
dummy,2023-08-21T00:00:00.000000000,1.5686214277927917
dummy,2023-08-22T00:00:00.000000000,1.562627674533575
dummy,2023-08-23T00:00:00.000000000,1.719484207173839
dummy,2023-08-24T00:00:00.000000000,1.9620113717020276
dummy,2023-08-25T00:00:00.000000000,1.9150329352516895
dummy,2023-08-28T00:00:00.000000000,1.9409548361807556
dummy,2023-08-29T00:00:00.000000000,2.084490846534584
dummy,2023-08-30T00:00:00.000000000,2.1733623171564806
dummy,2023-08-31T00:00:00.000000000,2.160783655361729
dummy,2023-09-01T00:00:00.000000000,2.1793908039242895
dummy,2023-09-05T00:00:00.000000000,2.1728974464825286
dummy,2023-09-06T00:00:00.000000000,2.623702781798936
dummy,2023-09-07T00:00:00.000000000,2.913094567383203
dummy,2023-09-08T00:00:00.000000000,2.8552143121957325
dummy,2023-09-11T00:00:00.000000000,2.8668430074980438
dummy,2023-09-12T00:00:00.000000000,2.963098823270275
dummy,2023-09-13T00:00:00.000000000,2.927779071384871
dummy,2023-09-14T00:00:00.000000000,2.935777535772875
dummy,2023-09-15T00:00:00.000000000,2.105568022944933
dummy,2023-09-18T00:00:00.000000000,2.098469650547303
dummy,2023-09-19T00:00:00.000000000,2.102079780396924
dummy,2023-09-20T00:00:00.000000000,2.2148747362445484
dummy,2023-09-21T00:00:00.000000000,2.2419794282631558
dummy,2023-09-22T00:00:00.000000000,2.2503723453558866
dummy,2023-09-25T00:00:00.000000000,2.2385277976944358
dummy,2023-09-26T00:00:00.000000000,2.3879303451915144
dummy,2023-09-27T00:00:00.000000000,2.4066890858542047
dummy,2023-09-28T00:00:00.000000000,2.3335569029453844
dummy,2023-09-29T00:00:00.000000000,2.3340229115643214
dummy,2023-10-02T00:00:00.000000000,2.3883181263038784
dummy,2023-10-03T00:00:00.000000000,2.387917427869889
dummy,2023-10-04T00:00:00.000000000,2.243742722599013
dummy,2023-10-05T00:00:00.000000000,2.018830331402512
dummy,2023-10-06T00:00:00.000000000,2.0378580009972933
dummy,2023-10-09T00:00:00.000000000,2.0358593883362905
dummy,2023-10-10T00:00:00.000000000,1.8792234707485946
dummy,2023-10-11T00:00:00.000000000,1.7760982896480833
dummy,2023-10-12T00:00:00.000000000,1.7844568491345048
dummy,2023-10-13T00:00:00.000000000,1.7965419947611168
dummy,2023-10-16T00:00:00.000000000,1.796169748957089
dummy,2023-10-17T00:00:00.000000000,1.3653644940881955
dummy,2023-10-18T00:00:00.000000000,1.0801524435487886
dummy,2023-10-19T00:00:00.000000000,1.0774790301569788
dummy,2023-10-20T00:00:00.000000000,1.1384378583294845
dummy,2023-10-23T00:00:00.000000000,1.0359672502955974
dummy,2023-10-24T00:00:00.000000000,0.9893772090316733
dummy,2023-10-25T00:00:00.000000000,1.026284941186689
dummy,2023-10-26T00:00:00.000000000,1.2343892632363178
dummy,2023-10-27T00:00:00.000000000,1.1590590748581153
dummy,2023-10-30T00:00:00.000000000,1.197746969681852
dummy,2023-10-31T00:00:00.000000000,1.059927036681978
dummy,2023-11-01T00:00:00.000000000,1.1510136339666301
dummy,2023-11-02T00:00:00.000000000,1.287186157131098
dummy,2023-11-03T00:00:00.000000000,1.2780869517628282
dummy,2023-11-06T00:00:00.000000000,1.157582764635238
dummy,2023-11-07T00:00:00.000000000,1.2010550560932904
dummy,2023-11-08T00:00:00.000000000,1.2121833942772782
dummy,2023-11-09T00:00:00.000000000,1.211370091724534
dummy,2023-11-10T00:00:00.000000000,1.3180877547834722
dummy,2023-11-13T00:00:00.000000000,1.3226912699516258
dummy,2023-11-14T00:00:00.000000000,1.3738703306181093
dummy,2023-11-15T00:00:00.000000000,1.3592276047578322
dummy,2023-11-16T00:00:00.000000000,1.3134962243578143
dummy,2023-11-17T00:00:00.000000000,1.2889516286387386
dummy,2023-11-20T00:00:00.000000000,1.3145299415093774
dummy,2023-11-21T00:00:00.000000000,1.299562910563539
dummy,2023-11-22T00:00:00.000000000,1.2949833792649645
dummy,2023-11-24T00:00:00.000000000,1.275358569306392
dummy,2023-11-27T00:00:00.000000000,1.2754859720628269
dummy,2023-11-28T00:00:00.000000000,1.252060080071354
dummy,2023-11-29T00:00:00.000000000,1.243135474551939
dummy,2023-11-30T00:00:00.000000000,1.2448304628906661
dummy,2023-12-01T00:00:00.000000000,1.1846066040077565
dummy,2023-12-04T00:00:00.000000000,1.2157022562986464
dummy,2023-12-05T00:00:00.000000000,1.3629221743116913
dummy,2023-12-06T00:00:00.000000000,1.310758369219744
dummy,2023-12-07T00:00:00.000000000,1.1318339210448705
dummy,2023-12-08T00:00:00.000000000,1.1291758347967926
dummy,2023-12-11T00:00:00.000000000,1.1358579911415723
dummy,2023-12-12T00:00:00.000000000,1.1546273760073806
dummy,2023-12-13T00:00:00.000000000,1.130638333145214
dummy,2023-12-14T00:00:00.000000000,0.9862317372106153
dummy,2023-12-15T00:00:00.000000000,0.9794541986256703
dummy,2023-12-18T00:00:00.000000000,0.9317936185771348
dummy,2023-12-19T00:00:00.000000000,0.8708750952091696
dummy,2023-12-20T00:00:00.000000000,0.8990540381695469
dummy,2023-12-21T00:00:00.000000000,0.8968653416191089
dummy,2023-12-22T00:00:00.000000000,0.7262661600265888
dummy,2023-12-26T00:00:00.000000000,0.7033713118234325
dummy,2023-12-27T00:00:00.000000000,0.6339041006593921
dummy,2023-12-28T00:00:00.000000000,0.632426477361776
dummy,2023-12-29T00:00:00.000000000,0.6146423229289452
"""
)

Functions

def vol_buckets(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
) -> pl.DataFrame:
  
    # Group by 'symbol' and apply 'calculate_vol_buckets' to each group
    if isinstance(data, pl.LazyFrame):
        data = data.collect()

    result = data.group_by("symbol").map_groups(
        lambda group_df: _vol_buckets_engine(
            group_df,
            lo_quantile,
            hi_quantile,
            _column_name_volatility,
        )
    )

    return result.lazy()

def _vol_buckets_engine(
    grouped_data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float,
    hi_quantile: float,
    _column_name_volatility: str,
) -> pl.LazyFrame:

    # Calculate low and high quantiles for the group
    low_vol = (
        grouped_data.lazy()
        .select(pl.col(_column_name_volatility).quantile(lo_quantile))
        .collect()
        .to_series()[0]
    )
    mid_vol = (
        grouped_data.lazy()
        .select(pl.col(_column_name_volatility).quantile(hi_quantile))
        .collect()
        .to_series()[0]
    )

    # Determine the volatility bucket for each row
    vol_bucket = (
        pl.when(pl.col(_column_name_volatility) <= low_vol)
        .then(pl.lit("low"))
        .when(pl.col(_column_name_volatility) <= mid_vol)
        .then(pl.lit("mid"))
        .otherwise(pl.lit("high"))
        .alias("vol_bucket")
    )

    return grouped_data.lazy().with_columns(vol_bucket).collect()

Reprex

result = vol_buckets(data=data, lo_quantile=0.3, hi_quantile=0.65)

Attempt 1

This seems a bit redundant in making a DF and then joining. Should I just add a column of low/mid_vol .over("symbol) and then filter? I would greatly appreciate any ideas! Thanks

import polars as pl

def vol_buckets2(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
) -> pl.DataFrame | pl.LazyFrame:
    """
    Calculate volatility buckets for each symbol without using `map_groups`.
    This function segments the input data into three volatility categories based on the specified quantiles of the volatility column.
    """
    # Ensure data is a LazyFrame for efficient computation
    if isinstance(data, pl.DataFrame):
        data = data.lazy()

    # Calculate low and high quantiles for each symbol
    low_vol = data.select(
        [(pl.col(_column_name_volatility)
        .quantile(lo_quantile)
        .over("symbol")
        .alias("low_vol")), pl.col("symbol"), pl.col("date")
    ])
    high_vol = data.select(
        pl.col(_column_name_volatility)
        .quantile(hi_quantile)
        .over("symbol")
        .alias("high_vol"), pl.col("symbol"), pl.col("date")
    )

    # Join quantile calculations back to the original data
    data = data.join(low_vol, on=["symbol", "date"], how="left")
    data = data.join(high_vol, on=["symbol", "date"], how="left")

    # Determine the volatility bucket for each row using expressions
    vol_bucket = (
        pl.when(pl.col(_column_name_volatility) <= pl.col("low_vol"))
        .then(pl.lit("low"))
        .when(pl.col(_column_name_volatility) <= pl.col("high_vol"))
        .then(pl.lit("mid"))
        .otherwise(pl.lit("high"))
        .alias("vol_bucket")
    )

    # Add the volatility bucket column to the data
    data = data.with_columns(vol_bucket)

    # Drop the intermediate quantile columns
    data = data.drop(["low_vol", "high_vol"])

    return data.collect() if isinstance(data, pl.LazyFrame) else data

>Solution :

You can achieve the same functionality using polars’ expression API with pl.Expr.qcut.

def vol_buckets_new(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
) -> pl.DataFrame:

    if isinstance(data, pl.LazyFrame):
        data = data.collect()

    result = data.with_columns(
        pl.col(_column_name_volatility)
        .qcut([lo_quantile, hi_quantile], labels=["low", "mid", "high"])
        .over("symbol")
        .alias("vol_bucket")
    )

    return result.lazy()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading