I want to apply the sample function from Pandas independently for each value of the index for a data frame. This can be done with a for loop like this:
import pandas
df = pandas.DataFrame({'something': [3,4,2,2,6,7], 'n': [1,1,2,2,3,3]})
df.set_index(['n'], inplace=True)
resampled_as_I_want_df = df[0:0]
for i in sorted(set(df.index)):
resampled_as_I_want_df = resampled_as_I_want_df.append(
df.loc[i].sample(frac=1, replace=True),
)
print(resampled_as_I_want_df)
Let me explain this in a human-friendly way. The df data frame looks like this:
something
n
1 3
1 4
2 2
2 2
3 6
3 7
Now we see that there are three "index groups" which have the values 1, 2 and 3. What I want to do is to apply the sample function in a way that the new data frame will have the same index, without random sampling, and the sampling is performed within each group as if they were independent data frames.
Is there a way to avoid the for loop? For large data frames it is a bottle neck.
>Solution :
Use df.groupby(level=0).sample(frac=1, replace=True).