Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to map in Polars

import polars as pl

df1 = pl.DataFrame(
    {
        'type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
        'date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-01', '2022-01-02', '2022-01-01', '2022-01-02', '2022-01-03'],
        'value': [1, 3, 5, 7, 8, 9, 12, 0, 1]

    }
)

df2 = pl.DataFrame(
    {
        'type': ['β', 'β', 'β'],
        'date': ['2022-01-01', '2022-01-02', '2022-01-03'],
        'value': [100, 233, 666]
    }
)

df3 = pl.DataFrame(
    {
        'type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
        'date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-01', '2022-01-02', '2022-01-01', '2022-01-02', '2022-01-03'],
        'value': [1, 3, 5, 7, 8, 9, 12, 0, 1],
        'type_β': ['β', 'β', 'β', 'β', 'β', 'β', 'β', 'β', 'β'],
        'value_β': [100, 233, 666, None, 100, 233, 100, 233, 666]
    }
)

print(df1, df2, df3)

How can we concatenate df1 and df2 into df3 based on the date?

Missing dates do not need to be filled, default None,
Please use Polars(polars version = 0.17.11)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You don’t want to concat. Think of concat as adding rows, not combining datasets. You want a join.

I think your example output is off just a touch though.

On the surface you want to do this:

df1.join(df2, on='date', how='outer', suffix="β")

which results in

shape: (9, 5)
┌──────┬────────────┬───────┬────────┬─────────┐
│ type ┆ date       ┆ value ┆ type_β ┆ value_β │
│ ---  ┆ ---        ┆ ---   ┆ ---    ┆ ---     │
│ str  ┆ str        ┆ i64   ┆ str    ┆ i64     │
╞══════╪════════════╪═══════╪════════╪═════════╡
│ A    ┆ 2022-01-01 ┆ 1     ┆ β      ┆ 100     │
│ A    ┆ 2022-01-02 ┆ 3     ┆ β      ┆ 233     │
│ A    ┆ 2022-01-03 ┆ 5     ┆ β      ┆ 666     │
│ A    ┆ 2022-01-04 ┆ 7     ┆ null   ┆ null    │
│ B    ┆ 2022-01-01 ┆ 8     ┆ β      ┆ 100     │
│ B    ┆ 2022-01-02 ┆ 9     ┆ β      ┆ 233     │
│ C    ┆ 2022-01-01 ┆ 12    ┆ β      ┆ 100     │
│ C    ┆ 2022-01-02 ┆ 0     ┆ β      ┆ 233     │
│ C    ┆ 2022-01-03 ┆ 1     ┆ β      ┆ 666     │
└──────┴────────────┴───────┴────────┴─────────┘

You can see how this has a null value in the type_β column in the same row that you have a null in the value_β column so I think that was a typo or else you want more functionality than you said.

If you do want to fill in type_β then you can follow up with

df1.join(df2, on='date', how='outer', suffix="_β") \
    .with_columns(type_β = pl.col('type_β').forward_fill())

shape: (9, 5)
┌──────┬────────────┬───────┬────────┬─────────┐
│ type ┆ date       ┆ value ┆ type_β ┆ value_β │
│ ---  ┆ ---        ┆ ---   ┆ ---    ┆ ---     │
│ str  ┆ str        ┆ i64   ┆ str    ┆ i64     │
╞══════╪════════════╪═══════╪════════╪═════════╡
│ A    ┆ 2022-01-01 ┆ 1     ┆ β      ┆ 100     │
│ A    ┆ 2022-01-02 ┆ 3     ┆ β      ┆ 233     │
│ A    ┆ 2022-01-03 ┆ 5     ┆ β      ┆ 666     │
│ A    ┆ 2022-01-04 ┆ 7     ┆ β      ┆ null    │
│ B    ┆ 2022-01-01 ┆ 8     ┆ β      ┆ 100     │
│ B    ┆ 2022-01-02 ┆ 9     ┆ β      ┆ 233     │
│ C    ┆ 2022-01-01 ┆ 12    ┆ β      ┆ 100     │
│ C    ┆ 2022-01-02 ┆ 0     ┆ β      ┆ 233     │
│ C    ┆ 2022-01-03 ┆ 1     ┆ β      ┆ 666     │
└──────┴────────────┴───────┴────────┴─────────┘

The distinction between how='outer', how='left', and the default how='inner' is how values should be treated that aren’t in both dfs that are being joined.

inner joins will only return rows where values in the column to be matched exist on both sides of the join

left joins will return all the rows on the left side regardless if they exist on the right side

outer joins will return all rows of both sides

In your example, it’s not clear which you prefer between left or outer, it’s only clear you don’t want inner.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading