I got a task i normally do in excel, and can’t figure out how to convert in a smart way to Python.
I have a dataframe roughly 700 rows and 30 columns, though only one column is relevant for this issue:
Column X
---------
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 0
A 0
A 0
A 0
A 0
B 1
B 1
B 1
B 1
B 1
B 1
B 1
B 1
B 1
B 1
B 1
B 0
B 0
B 0
C 1
C 1
I need to create a new column with the value 1, for the first 11 occurences of whatever is in column X, So the first 11 rows of "A" gets value 1 and the remaning ones gets 0.
So i guess i can make a column that count occurences of df.at[idx, "Column X"], but only for current row and rows above.
Once i have this i can make another column that writes 1 if the newly created column has a value of 11 or lower, and otherwise returns 0.
NB: Data is always sorted by X, so will follow the pattern shown above.
Any suggestions how this can be done?
>Solution :
Use GroupBy.cumcount
for counter starting by 0
, so for first 11
values compare by Series.lt
for less like 11
and cast output to inteegrs for Tru/False
to 1,0
mapping:
df['new'] = df.groupby('Column X').cumcount().lt(11).astype(int)
print (df)
Column X new
0 A 1
1 A 1
2 A 1
3 A 1
4 A 1
5 A 1
6 A 1
7 A 1
8 A 1
9 A 1
10 A 1
11 A 0
12 A 0
13 A 0
14 A 0
15 A 0
16 B 1
17 B 1
18 B 1
19 B 1
20 B 1
21 B 1
22 B 1
23 B 1
24 B 1
25 B 1
26 B 1
27 B 0
28 B 0
29 B 0
30 C 1
31 C 1