Extract indices of a grouped elements in Pandas

The objective is to extract the index number of a randomly selected grouped rows in Pandas.

Specifically, given a df

    nval
0      4
1      4
2      0
...
23     0
24     4
...
29     4
30     4
31     0

I would like to extract each 5 random index of the element 0 and 4.

For example, the 5 randomly selected value for

0

can be

3,11,15,16,22

and

4

can be

6 9 7 29 27

Currently, the code below answer the above objective

import numpy as np
import numpy.random
import pandas as pd
np.random.seed(0)
dval=[4,4,0,0,0,0,4,4,0,4,0,0,4,4,0,0,0,0,4,
4,0,0,0,0,4,0,4,4,4,4,4,0,]

df = pd.DataFrame (dict(nval=dval))
cgroup=5
df=df.reset_index()
all_df=[]
for idx in [0,4]:
  x=df[df['nval']==idx].reset_index(drop=True)
  ids = np.random.choice(len(x), size=cgroup, replace=False).tolist()
  all_df.append(x.iloc[ids].reset_index(drop=True))

df=pd.concat(all_df).reset_index(drop=True).sort_values(by=['index'])
sel_index=df[['index']]

Which produced

   index
0      3
1      6
2      7
3      9
4     11
5     15
6     16
7     22
8     27
9     29

However, I wonder there is compact way of doing this using pandas or numpy?

>Solution :

How about this:

import numpy as np
import numpy.random
import pandas as pd

np.random.seed(0)
dval=[4,4,0,0,0,0,4,4,0,4,0,0,4,4,0,0,0,0,4,4,0,0,0,0,4,0,4,4,4,4,4,0,]
df = pd.DataFrame (dict(nval=dval))

df2 = df.groupby('nval').sample(5).reset_index()
print(df2)

output:

   index  nval
0     31     0
1     22     0
2     14     0
3      8     0
4     17     0
5     29     4
6     13     4
7      1     4
8     19     4
9     12     4

Leave a Reply