Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new fake data with new primary keys from existing dataframe python

I have a dataframe as following:

df1 = pd.DataFrame({'id': ['1a', '2b', '3c'], 'name': ['Anna', 'Peter', 'John'], 'year': [1999, 2001, 1993]})

I want to create new data by randomly re-arranging values in each column but for column id I also need to add a random letter at the end of the values, then add the new data to existing df1 as following:

df1 = pd.DataFrame({'id': ['1a', '2b', '3c', '2by', '1ao', '1az', '3cc'], 'name': ['Anna', 'Peter', 'John', 'John', 'Peter', 'Anna', 'Anna'], 'year': [1999, 2001, 1993, 1999, 1999, 2001, 2001]})

Could anyone help me, please? Thank you very much.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use DataFrame.sample and add random letter by numpy.random.choice:

import string

N = 5
df2 = (df1.sample(n=N, replace=True)
          .assign(id =lambda x:x['id']+np.random.choice(list(string.ascii_letters),size=N)))
df1 = df1.append(df2, ignore_index=True)
print (df1)
    id   name  year
0   1a   Anna  1999
1   2b  Peter  2001
2   3c   John  1993
3  1aY   Anna  1999
4  3cp   John  1993
5  3cE   John  1993
6  2bz  Peter  2001
7  3cu   John  1993
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading