These two make sense:
df = polars.DataFrame(dict(
j=1,
))
print(df)
print(df.schema)
j
1
shape: (1, 1)
{'j': Int64}
df = polars.DataFrame(dict(
j=range(2)
))
print(df)
print(df.schema)
j
0
1
shape: (2, 1)
{'j': Int64}
However:
cols = list('ab')
df = polars.DataFrame(dict(
j=polars.struct([polars.lit(j).alias(col) for j, col in enumerate(cols)], eager=True)
))
print(df)
print(df.schema)
j
{0,1}
shape: (1, 1)
{'j': Struct([Field('a', Int32), Field('b', Int32)])}
df = polars.DataFrame(dict(
j=[polars.struct([polars.lit(j).alias(col) for j, col in enumerate(cols)], eager=True)
for k in range(2)]
))
print(df)
print(df.schema)
j
[{0,1}]
[{0,1}]
shape: (2, 1)
{'j': List(Struct([Field('a', Int32), Field('b', Int32)]))}
Why did changing from a single polars.struct to a list of polar.structs change the type of the element itself (from Struct to List(Struct))? I’d expect the result of the last one above to be the same as this:
df = (polars.DataFrame(dict(
j=range(2)
))
.with_columns(
polars.struct([polars.lit(j).alias(col) for j, col in enumerate(cols)], eager=True).alias('j')
))
print(df)
print(df.schema)
j
{0,1}
{0,1}
shape: (2, 1)
{'j': Struct([Field('a', Int32), Field('b', Int32)])}
Is there a shorter / better way to initialize the dataframe with a list of structs (i.e. a shorter way to get the same result as the last code example above)?
>Solution :
pl.struct with eager=True returns a Series object.
>>> type(pl.struct(a=0, b=1, eager=True))
polars.series.series.Series
To simplify things, it may help to look at it as:
>>> struct = [dict(a=0, b=1)]
>>> struct
[{'a': 0, 'b': 1}]
Your first example passes the "struct" directly:
>>> pl.DataFrame({"col": struct})
shape: (1, 1)
┌───────────┐
│ col │
│ --- │
│ struct[2] │
╞═══════════╡
│ {0,1} │
└───────────┘
The outer [] is treated as the container of rows, and each item inside is a single row.
Your second example passes a list of "structs":
pl.DataFrame({"col": [struct, struct]})
shape: (2, 1)
┌─────────────────┐
│ col │
│ --- │
│ list[struct[2]] │
╞═════════════════╡
│ [{0,1}] │
│ [{0,1}] │
└─────────────────┘
Each "struct" is treated as a single row, which has its [] intact.
If you have a list of series objects, perhaps you’re looking to .concat them, it’s not entirely clear.
pl.concat([
pl.struct(a=0, b=1, eager=True),
pl.struct(a=0, b=1, eager=True)
]).to_frame("col")
shape: (2, 1)
┌───────────┐
│ col │
│ --- │
│ struct[2] │
╞═══════════╡
│ {0,1} │
│ {0,1} │
└───────────┘