Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert all non-string columns to string

I got a polars.DataFrame object data_frame with mutlitple columns – strings and non-strings (like follows), an object where I want to cast all columns to strings:

import polars as pl
import polars.selectors as cs
data_frame = pl.DataFrame({'a': ['a', 'b', 'c'], 'b': range(3), 'c': [.1, .2, .3]})

non_string_columns = [col for col in data_frame.columns if data_frame[col].dtype != pl.Utf8]
for col in non_string_columns:
    data_frame = data_frame.with_columns(pl.col(col).cast(pl.Utf8))

However this should be possible with the cs selector as well, something like:

data_frame.with_columns(~cs.string().as_expr().cast(pl.Utf8))

which does not cut it polars.exceptions.SchemaError: invalid series dtype: expected Boolean, got str

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What is the the way to cast many columns at once into stirng (utilising the polars parallelism) with cs selector?

>Solution :

The ~ is coming last in the order of operations, trying to negate a string expression instead of the selector. Force the right order with some extra parentheses:

data_frame.with_columns((~cs.string()).cast(pl.Utf8))

(No need for as_expr here, either.)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading