Home get a count of vaues that are on or before a certain date from a pandas dataframe

Questions

get a count of vaues that are on or before a certain date from a pandas dataframe

August 29, 2022

I have a date 2020-05-31 and the following dataframe, where the column names are statuses:

     rejected revocation    decision         rfe   interview premium    received rfe_response  biometrics withdrawal appeal
196      None       None  2020-01-28        None        None    None  2020-01-16         None        None       None   None
203      None       None  2020-06-20  2020-04-01        None    None  2020-01-03   2020-08-08        None       None   None
209      None       None  2020-12-03  2020-06-03        None    None  2020-01-03         None        None       None   None
213      None       None  2020-06-23        None        None    None  2020-01-27         None  2020-02-19       None   None
1449     None       None  2020-05-12        None        None    None  2020-01-06         None        None       None   None
1660     None       None  2021-09-23  2021-05-27        None    None  2020-01-21   2021-08-17        None       None   None

I want to get the latest step each row is in, such that the latest steap is on or before the date mentioed above 2020-05-31

So the output for this woud be:

196: decision
203: rfe
209: received
213: biometrics
1449: decision
1660: received

or even a count works:

{
rejected = 0,
revocation = 0,
decision = 2,
rfe = 1,
interview = 0,
premium = 0,
received = 2,
rfe_response = 0,
biometrics 0 0,
withdrawal = 0,
appeal = 0 
}

Currently i am looping through each row, where i create a dict of {status: date}, then i sort by date, and take the key of the last value (which is a status)

This is very slow and takes forever

Is there a simpler or cleaner way of doing it?

NOTE: Each row will have atleast one date, in decision column

>Solution :

you can mask where the date is bigger than the chosen date, then use idxmax along the columns.

dt_max = '2020-05-31'
res = df.where(df.le(dt_max)).astype('datetime64[ns]').idxmax(axis=1)
print(res)
# 196       decision
# 203            rfe
# 209       received
# 213     biometrics
# 1449      decision
# 1660      received
# dtype: object

And for the count, per status, then you can do with value_counts like

dict_res = res.value_counts().reindex(df.columns, fill_value=0).to_dict()
print(dict_res)
#{'rejected': 0, 'revocation': 0, 'decision': 2, 'rfe': 1, 'interview': 0, 'premium': 0, 
# 'received': 2, 'rfe_response': 0, 'biometrics': 1, 'withdrawal': 0, 'appeal': 0}