The objective is to extract the index number of a randomly selected grouped rows in Pandas.
Specifically, given a df
nval
0 4
1 4
2 0
...
23 0
24 4
...
29 4
30 4
31 0
I would like to extract each 5 random index of the element 0 and 4.
For example, the 5 randomly selected value for
0
can be
3,11,15,16,22
and
4
can be
6 9 7 29 27
Currently, the code below answer the above objective
import numpy as np
import numpy.random
import pandas as pd
np.random.seed(0)
dval=[4,4,0,0,0,0,4,4,0,4,0,0,4,4,0,0,0,0,4,
4,0,0,0,0,4,0,4,4,4,4,4,0,]
df = pd.DataFrame (dict(nval=dval))
cgroup=5
df=df.reset_index()
all_df=[]
for idx in [0,4]:
x=df[df['nval']==idx].reset_index(drop=True)
ids = np.random.choice(len(x), size=cgroup, replace=False).tolist()
all_df.append(x.iloc[ids].reset_index(drop=True))
df=pd.concat(all_df).reset_index(drop=True).sort_values(by=['index'])
sel_index=df[['index']]
Which produced
index
0 3
1 6
2 7
3 9
4 11
5 15
6 16
7 22
8 27
9 29
However, I wonder there is compact way of doing this using pandas or numpy?
>Solution :
How about this:
import numpy as np
import numpy.random
import pandas as pd
np.random.seed(0)
dval=[4,4,0,0,0,0,4,4,0,4,0,0,4,4,0,0,0,0,4,4,0,0,0,0,4,0,4,4,4,4,4,0,]
df = pd.DataFrame (dict(nval=dval))
df2 = df.groupby('nval').sample(5).reset_index()
print(df2)
output:
index nval
0 31 0
1 22 0
2 14 0
3 8 0
4 17 0
5 29 4
6 13 4
7 1 4
8 19 4
9 12 4