Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I concatenate columns values (all but one) to a list and add it as a column with polars?

I have the input in this format:

import polars as pl

data = {"Name": ['Name_A', 'Name_B','Name_C'], "val_1": ['a',None, 'a'],"val_2": [None,None, 'b'],"val_3": [None,'c', None],"val_4": ['c',None, 'g'],"val_5": [None,None, 'i']}
df = pl.DataFrame(data)
print(df)

shape: (3, 6)
┌────────┬───────┬───────┬───────┬───────┬───────┐
│ Name   ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 │
│ ---    ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ str    ┆ str   ┆ str   ┆ str   ┆ str   ┆ str   │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╡
│ Name_A ┆ a     ┆ null  ┆ null  ┆ c     ┆ null  │
│ Name_B ┆ null  ┆ null  ┆ c     ┆ null  ┆ null  │
│ Name_C ┆ a     ┆ b     ┆ null  ┆ g     ┆ i     │
└────────┴───────┴───────┴───────┴───────┴───────┘

I want the output as:

shape: (3, 7)
┌────────┬───────┬───────┬───────┬───────┬───────┬───────────────────┐
│ Name   ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 ┆ combined          │
│ ---    ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---               │
│ str    ┆ str   ┆ str   ┆ str   ┆ str   ┆ str   ┆ list[str]         │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════════════════╡
│ Name_A ┆ a     ┆ null  ┆ null  ┆ c     ┆ null  ┆ ["a", "c"]        │
│ Name_B ┆ null  ┆ null  ┆ c     ┆ null  ┆ null  ┆ ["c"]             │
│ Name_C ┆ a     ┆ b     ┆ null  ┆ g     ┆ i     ┆ ["a", "b","g""i"] │
└────────┴───────┴───────┴───────┴───────┴───────┴───────────────────┘

I want to combine all the columns as a list except the Name column. I have simplified the data for this question but in reality we have many columns of the val_N format and a generic code where I do not have to list each column name would be great.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

For the main answer in the question you can do

df.with_columns(combined = pl.concat_list(pl.exclude('Name')))

pl.exclude is how to get all columns BUT the ones given.

To get rid of the nulls in the final list, version 0.19.4 just introduced list.drop_nulls.

df.with_columns(combined = pl.concat_list(pl.exclude('Name')).list.drop_nulls())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading