Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python pandas sample without mixing index

I want to apply the sample function from Pandas independently for each value of the index for a data frame. This can be done with a for loop like this:

import pandas

df = pandas.DataFrame({'something': [3,4,2,2,6,7], 'n': [1,1,2,2,3,3]})
df.set_index(['n'], inplace=True)

resampled_as_I_want_df = df[0:0]
for i in sorted(set(df.index)):
    resampled_as_I_want_df = resampled_as_I_want_df.append(
        df.loc[i].sample(frac=1, replace=True),
    )

print(resampled_as_I_want_df)

Let me explain this in a human-friendly way. The df data frame looks like this:

   something
n           
1          3
1          4
2          2
2          2
3          6
3          7

Now we see that there are three "index groups" which have the values 1, 2 and 3. What I want to do is to apply the sample function in a way that the new data frame will have the same index, without random sampling, and the sampling is performed within each group as if they were independent data frames.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there a way to avoid the for loop? For large data frames it is a bottle neck.

>Solution :

Use df.groupby(level=0).sample(frac=1, replace=True).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading