How to randomly chose rows of a pandas dataframe to update

I am a beginner in python and I have a pandas dataframe that I want to change as below:

10% of rows of column "review" must be changed by adding a prefix
90% of rows of column "review" must be unchanged

for changing all rows of "review" I can use the code :
X_test["modified_review"] = " abc " + X_test["review"]

and to select 10% of rows I can use :
X_test.sample(frac=0.1)

But I don’t know how to combine the two codes to modify only the selected lines.

Please help!

>Solution :

You can sample 10% random indexes and update the corresponding locations only:

df["modified_review"] = df["review"]

rand_ids = df.index.to_series().sample(frac=0.1)
df.loc[rand_ids, "modified_review"] = " abc " + df.loc[rand_ids, "modified_review"]

Leave a Reply