Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas data frame – Group a column values then Randomize new values of that column

I have one column (X) that contains some values with duplicates (several rows have the same value and they all are sequenced).
I have a requirement to randomize new values for that columns for testing one issue. so I tried:

np.random.seed(RSEED)
df["X"] = np.random.randint(100, 500, df.shape[0])

But this is not enough, I need to keep the sequences, I mean to group by same value then to randomize for all of the rows of that value a new number, and to do it for all grouped values of the original column. e.g.

X new X (randomized)
210 500
210 500
. .
. .
340 100
340 100
. .
. .

I started looking if Pandas has something built-in, I can group by pandas.DataFrame.groupBy but couldn’t find a pandas.DataFrame.random that can be applied for the same group.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Simple approach is to use groupby and transform to broadcast random integers per group

df.groupby('X')['X'].transform(lambda _: np.random.randint(100, 500))

0    137
1    137
2    .
3    .
4    335
5    335
Name: X, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading