python polars: new column based on condition/comparison of two existing columns

I am trying to create a new column in Polars data frame based on comparison of two existing columns: import polars as pl data = {"a": [2, 30], "b": [20, 3]} df = pl.DataFrame(data) df Out[4]: shape: (2, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ i64 ┆ i64 │… Read More python polars: new column based on condition/comparison of two existing columns

Polars equivalent of pandas factorize

Does polars have the function to encode string column into integers (1, 2, 3) like pandas.factorize? Didn’t find it in the polars documentation >Solution : Perhaps you’re looking for a dense rank or the categorical type. df = pl.DataFrame({"column": ["foo", "bar", "baz", "foo", "foo"]}) df.with_columns(rank = pl.col("column").rank("dense")) shape: (5, 2) ┌────────┬──────┐ │ column | rank… Read More Polars equivalent of pandas factorize

Polars: Replace the Year in a DataFrame's Datetime Column

I have a polars DataFrame with a bunch of columns. One of them has datetime values (for example hourly data from 2017 – 2019). How can I replace the year of all the datetime values with a year I specify? Original Datetime Column: shape: (26280, 1) Index datetime[ns] 2017-01-01 00:00:00 2017-01-01 01:00:00 2017-01-01 02:00:00 2017-01-01… Read More Polars: Replace the Year in a DataFrame's Datetime Column

How to change this code to polars ?" TypeError: 'GroupBy' object is not subscriptable"

This code is pandas. pandas_reserve_tb \ .groupby([‘hotel_id’, ‘people_num’])[‘total_price’] \ .sum().reset_index() I would like to change this code to polars. polars_researve_tb \ .groupby("hotel_id", "people_num")[‘total_price’] \ .sum().with_row_count() But, I got the error "TypeError: ‘GroupBy’ object is not subscriptable" How to solove this error? >Solution : You probably meant polars_researve_tb \ .groupby(["hotel_id", "people_num"]).agg(pl.col(‘total_price’).sum()) I’d advise posting reproducible examples… Read More How to change this code to polars ?" TypeError: 'GroupBy' object is not subscriptable"

Python Polars find the length of a string in a dataframe

I am trying to count the number of letters in a string in Polars. I could probably just use an apply method and get the len(Name). However, I was wondering if there is a polars specific method? import polars as pl mydf = pl.DataFrame( {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"], "Name": ["John", "Joe", "James"]}) print(mydf) │start_date ┆… Read More Python Polars find the length of a string in a dataframe