Say I have this:
df = polars.DataFrame(dict(
j=[[1], [2], [3]],
k=[[1, 1], [2], [3]],
))
j (list[i64]) k (list[i64])
[1] [1, 1]
[2] [2]
[3] [3]
shape: (3, 2)
All lists in j have one element, while k has at least one list that has more than one element.
I’d like to unwrap all one-element lists across all columns, i.e. get this:
dfj = polars.DataFrame(dict(
j=[1, 2, 3],
k=[[1, 1], [2], [3]],
))
j (i64) k (list[i64])
1 [1, 1]
2 [2]
3 [3]
shape: (3, 2)
I’ve tried this:
dfj = (df
.with_columns(
polars
.when(polars.col(col).list.lengths().max() == 1)
.then(polars.col(col).list.first())
.otherwise(polars.col(col))
for col in df.columns
)
)
but it results in:
exceptions.ArrowErrorException: NotYetImplemented("Casting from Int64 to LargeList(Field { name: \"item\", data_type: Int64, is_nullable: true, metadata: {} }) not supported")
Any idea why this is not working? Also, is there a way to do what I’m after?
>Solution :
Your expression is trying to produce i64 or list[i64] which is not compatible.
You essentially want to replace the whole column:
df.with_columns(
col.list.first() for col in df
if col.list.lengths().max() == 1
)
shape: (3, 2)
┌─────┬───────────┐
│ j ┆ k │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1 ┆ [1, 1] │
│ 2 ┆ [2] │
│ 3 ┆ [3] │
└─────┴───────────┘