I have the below dataframe:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'id': [1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3,
4, 4, 4, 4,
5, 5, 5, 5, 5,5, 5, 5,5],
'cycle': [1,2, 3, 4, 5,6,7,8,9,10,11,
1,2, 3,4,5,6,
1,2, 3, 4, 5,6,
1,2, 3, 4,
1,2, 3, 4, 5,6,7,8,9,],
'Salary': [7, 7, 7,8,9,10,11,12,13,14,15,
4, 4, 4,4,5,6,
8,9,10,11,12,13,
8,9,10,11,
7, 7,9,10,11,12,13,14,15,],
'Children': ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'No','No', 'Yes', 'Yes', 'Yes', 'No',
'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes',
'No','Yes', 'Yes', 'No','No', 'Yes',
'Yes', 'No','Yes', 'Yes',
'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No',],
'Days': [123, 128, 66, 66, 120, 141, 52,96, 120, 141, 52,
96, 120,120, 141, 52,96,
15,123, 128, 66, 120, 141,
141,123, 128, 66,
123, 128, 66, 123, 128, 66, 120, 141, 52,],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
The above dataframe looks as such:
Here, every id has different cycles as per the ‘cycle’ column. For example,
id-1 has maximum 11 cycles.
id-2 has maximum 6 cycles.
id-3 has maximum 6 cycles.
id-4 has maximum 4 cycles.
id-5 has maximum 9 cycles.
I have a certain threshold limit on ‘cycles’. Let’s say cycle_threshold = 8
If maximum cycles for and id < cycle_threshold, then ‘Days’ column remain unchanged.
Else,’Days’ column will be marked as ‘NA’.
For example,
For id-2, since maximum cycles are 6, which is < 8, the ‘Days’ column remains unchanged
However, for id-1, since maximum cycles are 11, which is > 8, the ‘Days’ column becomes ‘NA’.
The result looks as such:
Can somebody please let me know how do I achieve this task in Python?
>Solution :
Use GroupBy.transform for get maximal cycle per groups, compare by cycle_threshold and set NaNs in Series.where:
cycle_threshold = 8
m1 = df.groupby('id')['cycle'].transform('max').lt(cycle_threshold)
df['Days'] = df['Days'].where(m1)
#alternative
df.loc[~m1, 'Days'] = np.nan
print("df = \n", df)
id cycle Salary Children Days
0 1 1 7 No NaN
1 1 2 7 Yes NaN
2 1 3 7 Yes NaN
3 1 4 8 Yes NaN
4 1 5 9 Yes NaN
5 1 6 10 No NaN
6 1 7 11 No NaN
7 1 8 12 Yes NaN
8 1 9 13 Yes NaN
9 1 10 14 Yes NaN
10 1 12 15 No NaN
11 2 1 4 Yes 96.0
12 2 2 4 Yes 120.0
13 2 3 4 No 120.0
14 2 4 4 Yes 141.0
15 2 5 5 Yes 52.0
16 2 6 6 Yes 96.0
17 3 1 8 No 15.0
18 3 2 9 Yes 123.0
19 3 3 10 Yes 128.0
20 3 4 11 No 66.0
21 3 5 12 No 120.0
22 3 6 13 Yes 141.0
23 4 1 8 Yes 141.0
24 4 2 9 No 123.0
25 4 3 10 Yes 128.0
26 4 4 11 Yes 66.0
27 5 1 7 No NaN
28 5 2 7 Yes NaN
29 5 3 9 No NaN
30 5 4 10 No NaN
31 5 5 11 Yes NaN
32 5 6 12 Yes NaN
33 5 7 13 Yes NaN
34 5 8 14 Yes NaN
35 5 9 15 No NaN

