Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Randomly Select Columns to Shuffle of a Two-Dimensional Dataframe

I would like to randomly select a few columns of a 2 dimensional dataframe, and shuffle the values within those columns. I can easily shuffle all values (column-wise) of the dataframe, but I am looking to only do so to a randomly selected few.

For instance, take the 6×6 dataframe below:


      0    1     2     3     4     5
0     5    3     7     1     2     9
1     1    7     5     3     0     8
2     0    2     7     1     6     5
3     8    4     2     1     9     7
4     2    9     5     6     3     4
5     7    5     8     2     1     0

Randomly selecting a few of the 6 columns, note the following output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      0    1     2     3     4     5
0     2    9     7     1     2     4
1     5    7     5     3     0     0
2     7    2     7     1     6     5
3     8    3     2     1     9     7
4     1    5     5     6     3     9
5     0    4     8     2     1     8

The above shows the 1st, 2nd and last column shuffled, and all others remain as is.

The following code allows me to shuffle all columns:

import numpy as np
df = np.random.random((6,6))
np.random.random(df)

And, yet, after many attempts, I have been unable to modify this to only select (randomly) a few columns.
Any advice will be greatly appreciated. Thank you.

>Solution :

Assuming this input example:

import numpy as np
df = pd.DataFrame(np.arange(4*5).reshape(4, 5, order='F'))

   0  1   2   3   4
0  0  4   8  12  16
1  1  5   9  13  17
2  2  6  10  14  18
3  3  7  11  15  19

I would use:

import numpy as np

# random number of columns
n = np.random.randint(0, df.shape[1])

# pick n random columns
cols = np.random.choice(df.columns, 3, replace=False)

# shuffle them independently
df[cols] = df[cols].apply(lambda s: np.random.choice(s, len(s), replace=False))

Example output:

   0  1   2   3   4
0  1  4  11  14  16
1  0  5   8  15  17
2  3  6  10  13  18
3  2  7   9  12  19
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading