Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Initialize polars dataframe from list of structs

These two make sense:

df = polars.DataFrame(dict(
  j=1,
  ))
print(df)
print(df.schema)

 j
 1
shape: (1, 1)
{'j': Int64}

df = polars.DataFrame(dict(
  j=range(2)
  ))
print(df)
print(df.schema)
  
 j
 0
 1
shape: (2, 1)
{'j': Int64}

However:

cols = list('ab')

df = polars.DataFrame(dict(
  j=polars.struct([polars.lit(j).alias(col) for j, col in enumerate(cols)], eager=True)
  ))
print(df)
print(df.schema)

 j
 {0,1}
shape: (1, 1)
{'j': Struct([Field('a', Int32), Field('b', Int32)])}

df = polars.DataFrame(dict(
  j=[polars.struct([polars.lit(j).alias(col) for j, col in enumerate(cols)], eager=True)
    for k in range(2)]
  ))
print(df)
print(df.schema)

 j
 [{0,1}]
 [{0,1}]
shape: (2, 1)
{'j': List(Struct([Field('a', Int32), Field('b', Int32)]))}

Why did changing from a single polars.struct to a list of polar.structs change the type of the element itself (from Struct to List(Struct))? I’d expect the result of the last one above to be the same as this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = (polars.DataFrame(dict(
  j=range(2)
  ))
  .with_columns(
    polars.struct([polars.lit(j).alias(col) for j, col in enumerate(cols)], eager=True).alias('j')
    ))
print(df)
print(df.schema)

 j
 {0,1}
 {0,1}
shape: (2, 1)
{'j': Struct([Field('a', Int32), Field('b', Int32)])}

Is there a shorter / better way to initialize the dataframe with a list of structs (i.e. a shorter way to get the same result as the last code example above)?

>Solution :

pl.struct with eager=True returns a Series object.

>>> type(pl.struct(a=0, b=1, eager=True))
polars.series.series.Series

To simplify things, it may help to look at it as:

>>> struct = [dict(a=0, b=1)]
>>> struct
[{'a': 0, 'b': 1}]

Your first example passes the "struct" directly:

>>> pl.DataFrame({"col": struct})
shape: (1, 1)
┌───────────┐
│ col       │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {0,1}     │
└───────────┘

The outer [] is treated as the container of rows, and each item inside is a single row.

Your second example passes a list of "structs":

pl.DataFrame({"col": [struct, struct]})
shape: (2, 1)
┌─────────────────┐
│ col             │
│ ---             │
│ list[struct[2]] │
╞═════════════════╡
│ [{0,1}]         │
│ [{0,1}]         │
└─────────────────┘

Each "struct" is treated as a single row, which has its [] intact.

If you have a list of series objects, perhaps you’re looking to .concat them, it’s not entirely clear.

pl.concat([
    pl.struct(a=0, b=1, eager=True),
    pl.struct(a=0, b=1, eager=True)
]).to_frame("col")
shape: (2, 1)
┌───────────┐
│ col       │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {0,1}     │
│ {0,1}     │
└───────────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading