Python pandas sample without mixing index

January 18, 2022

I want to apply the sample function from Pandas independently for each value of the index for a data frame. This can be done with a for loop like this:

import pandas

df = pandas.DataFrame({'something': [3,4,2,2,6,7], 'n': [1,1,2,2,3,3]})
df.set_index(['n'], inplace=True)

resampled_as_I_want_df = df[0:0]
for i in sorted(set(df.index)):
    resampled_as_I_want_df = resampled_as_I_want_df.append(
        df.loc[i].sample(frac=1, replace=True),
    )

print(resampled_as_I_want_df)

Let me explain this in a human-friendly way. The df data frame looks like this:

   something
n           
1          3
1          4
2          2
2          2
3          6
3          7

Now we see that there are three "index groups" which have the values 1, 2 and 3. What I want to do is to apply the sample function in a way that the new data frame will have the same index, without random sampling, and the sampling is performed within each group as if they were independent data frames.