How to find past values by ID in Python

Advertisements

I have a dataset that looks at all employees’ history. The goal I am trying to get to is see an employees current manager and previous manager only if that previous manager has left without being replaced. To identify if a manager has left you look at the ManagerPositionNum this column is unique to the Manager and if another Manager has those numbers then they are filling in for a vacant role. I am running pandas and numpy with this file.

Here is a sample of what I have:

EmpID      Date           Job_Title        ManagerName      ManagerPositionNum
 101     May 2021         Sales Rep         John Doe             1111
 101     June 2021        Sales Rep         John Doe             1111       
 102     February 2022    Tech Support      Mary Sue             2111
 102     March 2022       Tech Support      Mary Sue             2111
 102     April 2022       Tech Support      John Doe             2111 
 103     October 2022     HR Advisor        Sarah Long           3111
 103     November 2022    HR Advisor        Michael Scott        4111
 103     December 2022    HR Advisor        John Doe             4111
 103     December 2022    HR Advisor        John Doe             4111
 

Desired Output:

 EmpID        Date          Job_Title       ManagerName   ManagerPositionNum   Vacated Manager
  101       May 2021        Sales Rep        John Doe           1111   
  101       June 2021       Sales Rep        John Doe           1111 
  102       February 2022   Tech Support     Mary Sue           2111           Mary Sue 
  102       March 2022      Tech Support     Mary Sue           2111           Mary Sue 
  102       April 2022      Tech Support     John Doe           2111           Mary Sue 
  103       October 2022    HR Advisor       Sarah Long         3111
  103       November 2022   HR Advisor       Michael Scott      4111    
  103       December 2022   HR Advisor       John Doe           4111           Michael Scott
  103       January 2023    HR Advisor       John Doe           4111           Michael Scott

Just for clarification:

1111 is unique to John Doe

2111 is unique to Mary Sue

3111 is unique to Sarah Long

4111 is unique to Michael Scott

Code I have tried:

 reportid = df.groupby('ManagerName')['ManagerPositionNum'].transform('first')m =
 ~df['ManagerPositionNum'].eq(reportid) df.loc[m,'ManagerName']

>Solution :

Use a custom groupby.apply. Identify the second to last unique manager and ffill their name per group:

df['Vacated Manager'] = (df.groupby('EmpID', group_keys=False)['ManagerName']
                         .apply(lambda s: s.where(pd.factorize(s[::-1])[0][::-1]==1).ffill())
                        )

Output:

   EmpID           Date     Job_Title    ManagerName  ManagerPositionNum Vacated Manager
0    101       May 2021     Sales Rep       John Doe                1111             NaN
1    101      June 2021     Sales Rep       John Doe                1111             NaN
2    102  February 2022  Tech Support       Mary Sue                2111        Mary Sue
3    102     March 2022  Tech Support       Mary Sue                2111        Mary Sue
4    102     April 2022  Tech Support       John Doe                2111        Mary Sue
5    103   October 2022    HR Advisor     Sarah Long                3111             NaN
6    103  November 2022    HR Advisor  Michael Scott                4111   Michael Scott
7    103  December 2022    HR Advisor       John Doe                4111   Michael Scott
8    103  December 2022    HR Advisor       John Doe                4111   Michael Scott

Leave a ReplyCancel reply