I have a pandas dataframe that represents the trips I have taken for work. Each row is a single trip, with a column for the date and the number of kilometers traveled.
I get reimbursed on a per kilometer basis for every trip besides the first and the last of each day (these are considered ordinary travel to and from work).
So my data frame looks something like this:
day, distance
1, 5
1, 2
1, 7
2, 11
2, 11
3, 4
3, 10
3, 5
3, 12
I would like to add a column in here that flags all but the first and last trips of the day. Such as:
day, distance, claimable
1, 5, 0
1, 2, 1
1, 7, 0
2, 11, 0
2, 11, 0
3, 4, 0
3, 10, 1
3, 5, 1
3, 12, 0
Assuming I have a dataframe with the columns above is there a way to do something like this:
import pandas as pd
df = pd.DataFrame({'day':(1,1,1,2,2,3,3,3,3),
'dist':(5,2,7,11,11,4,10,5,12),
},)
df['claim'] = 0
# set the value of the "claimable" column to 1 on all
# but the first and last trip of the day
df.groupby("day").nth(slice(1,-1)).loc[:, "claim"] = 1
>Solution :
You can use transform
df = pd.DataFrame({
'day':(1,1,1,2,2,3,3,3,3),
'dist':(5,2,7,11,11,4,10,5,12),
})
def is_claimable(group):
claim = np.ones(len(group), dtype='int8')
claim[[0, -1]] = 0
return claim
df['claim'] = df.groupby("day")['dist'].transform(is_claimable)
Output:
>>> df
day dist claim
0 1 5 0
1 1 2 1
2 1 7 0
3 2 11 0
4 2 11 0
5 3 4 0
6 3 10 1
7 3 5 1
8 3 12 0