Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – Subtract Date by First Entry?

I’m using polars library in python to manipulate some dataframe.

I’m trying to do the following:
:

For some dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Person.. Fight with On
A B 3 Jan
A C 4 Jan
A D 5 Jan
A E 5 Jan
A B 10 Jan
A B 20 Jan
A C 20 Jan

I want to return the "distance" between the current fighter-pair and the first fight they had, such that:
:

Person.. Fight with On Distance
A B 3 Jan 0 Days
A C 4 Jan 0 Days
A D 5 Jan 0 Days
A E 5 Jan 0 Days
A B 10 Jan 7 Days (i.e. 10 Jan – 3 Jan); (CurrentDate – ABFirstFight)
A B 20 Jan 17 Days (i.e. 20 Jan – 3 Jan); (CurrentDate – ABFirstFight)
A C 20 Jan 16 Days (i.e. 20 Jan – 4 Jan); (CurrentDate – ACFirstFight)

<What I’ve Tried>:

  1. polars "first" function: Only returned the head of the dataframe
  2. polars "first" function with some combinations of "over"/"group_by"/"rolling" functions: Returned some numbers, but I can’t make sense of why the output was that way

Does anyone have any advice on how to attempt this?

I think I might need to use some combination of "group_by" or "over", "first", and perhaps "sub" (to subtract two dates?), but I’m not sure how to proceed.
The hardest part for me is to try to extract the first entry of a given group (e.g. first date entry of the A-B pair, or the A-C pair, etc.)

>Solution :

Here’s one way to do this:

Group the dataframe by [person, fight_with], select the min on, and then join it with the original dataframe.

Here’s some code:

import polars as pl

df = pl.DataFrame(
    {
        "person": ["A", "A", "A", "A", "A", "A", "A"],
        "fight_with": ["B", "C", "D", "E", "B", "B", "C"],
        "on": [3, 4, 5, 5, 10, 20, 20],
    }
)

first_fight = df.group_by("person", "fight_with").agg(pl.min("on").alias("min_on"))

df = df.join(first_fight, on=["person", "fight_with"]).with_columns(
    distance=pl.col("on") - pl.col("min_on")
)

print(df)

Output:

shape: (7, 5)
┌────────┬────────────┬─────┬────────┬──────────┐
│ person ┆ fight_with ┆ on  ┆ min_on ┆ distance │
│ ---    ┆ ---        ┆ --- ┆ ---    ┆ ---      │
│ str    ┆ str        ┆ i64 ┆ i64    ┆ i64      │
╞════════╪════════════╪═════╪════════╪══════════╡
│ A      ┆ B          ┆ 3   ┆ 3      ┆ 0        │
│ A      ┆ C          ┆ 4   ┆ 4      ┆ 0        │
│ A      ┆ D          ┆ 5   ┆ 5      ┆ 0        │
│ A      ┆ E          ┆ 5   ┆ 5      ┆ 0        │
│ A      ┆ B          ┆ 10  ┆ 3      ┆ 7        │
│ A      ┆ B          ┆ 20  ┆ 3      ┆ 17       │
│ A      ┆ C          ┆ 20  ┆ 4      ┆ 16       │
└────────┴────────────┴─────┴────────┴──────────┘
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading