Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pseudonymize data in a pandas DataFrame

I have a pandas dataframe that looks like this:

id user action timestamp
1 Jim start 12/10/2022
2 Jim start 12/10/2022
3 Jim end 2/2/2022
4 Linette start 8/18/2022
5 Linette start 3/24/2022
6 Linette end 8/27/2022
7 Rachel start 2/7/2022
8 Rachel end 1/4/2023
9 James start 6/12/2022
10 James end 5/14/2022
11 James start 11/28/2022
12 James start 8/9/2022
13 James end 2/15/2022

For each user, there can be more than one start event, but only one end. Imagine that they sometimes need to start a book over again, but only finish it once.

What I want is to calculate the time difference between the first start and the end, so keep, for each user, the first occurrence of "start" and "end" in each group.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any hint?

>Solution :

>>> (df.groupby(["user", "action"], sort=False)["timestamp"]
       .first()
       .droplevel("action")
       .diff().iloc[1::2])

user
James       29 days
Jim        311 days
Linette     -9 days
Rachel    -331 days
Name: timestamp, dtype: timedelta64[ns]
  • for "timestamp" of each "user" & "action" pair, get the first occurences
    • this will effectively take the first start, and the (only) end
  • then drop the carried over "action" level of groupers
  • take the difference from ends and starts
  • take every other value to get per-user difference

(sort=False ensures during groupby that start’s don’t get mixed up.)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading