I have a Polars DataFrame that looks like this:
d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
dp = pl.DataFrame(d)
shape: (5, 3)
┌─────┬────────┬──────┐
│ id ┆ type ┆ area │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪════════╪══════╡
│ N/A ┆ red ┆ 0 │
│ N/A ┆ blue ┆ 0 │
│ 1 ┆ yellow ┆ 3 │
│ 1 ┆ green ┆ 4 │
│ 2 ┆ yellow ┆ 5 │
└─────┴────────┴──────┘
I would like to do some sort of pivot or transpose operation so that each row is an id (excluding ‘N/A’) and there is a column for each type, and the value is the area. If no value is given, it should be zero. So in this case, the result should look like this:
red blue yellow green
'1' 0 0 3 4
'2' 0 0 5 0
How can I do this in Polars? I would rather avoid converting the whole thing into pandas.
>Solution :
In Polars, you can achieve the desired result by using the pivot operation. Here’s how you can do it for your specific DataFrame:
import polars as pl
d = {
'id': ['N/A', 'N/A', '1', '1', '2'],
'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
'area': [0, 0, 3, 4, 5]
}
dp = pl.DataFrame(d)
# Remove rows with 'N/A' in the 'id' column
dp = dp.filter(pl.col("id") != "N/A")
# Perform the pivot operation
dp = dp.pivot('id', 'type', 'area', aggfn='first')
# Fill missing values with 0
dp = dp.fill_null(0)
print(dp)
Output:
shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id ┆ blue ┆ green ┆ red │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪═══════╪══════╡
│ 1 ┆ 0 ┆ 4 ┆ 0 │
│ 2 ┆ 0 ┆ 0 ┆ 0 │
└─────┴──────┴───────┴──────┘