We have the following Pandas Dataframe:
# Stackoverflow question
data = {'category':[1, 2, 3, 1, 2, 3, 1, 2, 3], 'date':['2000-01-01', '2000-01-01', '2000-01-01', '2000-01-02', '2000-01-02', '2000-01-02', '2000-01-03', '2000-01-03', '2000-01-03']}
df = pd.DataFrame(data=data)
df['date'] = pd.to_datetime(df['date'])
df
category date
0 1 2000-01-01
1 2 2000-01-01
2 3 2000-01-01
3 1 2000-01-02
4 2 2000-01-02
5 3 2000-01-02
6 1 2000-01-03
7 2 2000-01-03
8 3 2000-01-03
How can we query this dataframe to find the date 2000-01-02 with category 3? So we are looking for the row with index 5.
It should be accomplished without set_index(‘date’).
The reason for this is as follows, when setting the index on the actual data rather than the example data I receive the following error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
>Solution :
Take a subset of relevant category, subtract the target date, and get idxmin
tmp = df.loc[df.category.eq(3)]
(tmp.date - pd.to_datetime("2000-01-02")).abs().idxmin()
# 5