Home Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

Questions

Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

June 28, 2023

I have a Polars DataFrame that looks like this:

d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
dp = pl.DataFrame(d)
shape: (5, 3)
┌─────┬────────┬──────┐
│ id  ┆ type   ┆ area │
│ --- ┆ ---    ┆ ---  │
│ str ┆ str    ┆ i64  │
╞═════╪════════╪══════╡
│ N/A ┆ red    ┆ 0    │
│ N/A ┆ blue   ┆ 0    │
│ 1   ┆ yellow ┆ 3    │
│ 1   ┆ green  ┆ 4    │
│ 2   ┆ yellow ┆ 5    │
└─────┴────────┴──────┘

I would like to do some sort of pivot or transpose operation so that each row is an id (excluding ‘N/A’) and there is a column for each type, and the value is the area. If no value is given, it should be zero. So in this case, the result should look like this:

      red   blue  yellow  green
'1'    0      0     3      4
'2'    0      0     5      0

How can I do this in Polars? I would rather avoid converting the whole thing into pandas.

>Solution :

In Polars, you can achieve the desired result by using the pivot operation. Here’s how you can do it for your specific DataFrame:

import polars as pl

d = {
    'id': ['N/A', 'N/A', '1', '1', '2'],
    'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
    'area': [0, 0, 3, 4, 5]
}

dp = pl.DataFrame(d)

# Remove rows with 'N/A' in the 'id' column
dp = dp.filter(pl.col("id") != "N/A")

# Perform the pivot operation
dp = dp.pivot('id', 'type', 'area', aggfn='first')

# Fill missing values with 0
dp = dp.fill_null(0)

print(dp)

Output:

shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id  ┆ blue ┆ green ┆ red  │
│ --- ┆ ---  ┆ ---   ┆ ---  │
│ str ┆ i64  ┆ i64   ┆ i64  │
╞═════╪══════╪═══════╪══════╡
│ 1   ┆ 0    ┆ 4     ┆ 0    │
│ 2   ┆ 0    ┆ 0     ┆ 0    │
└─────┴──────┴───────┴──────┘