I have a dataframe in where I have pulled out 1% of the dataset to create a new column and assign a value to it. Now, I would like to add that data back to the intial dataframe. How would I go about that?
Thanks
I have tried using the left join, but it seems to be adding more row than inserting it back again. Any help is appreciated.
>Solution :
You have to use index alignment:
df = pd.read_csv('test.csv')
df1 = df.sample(frac=0.05, random_state=42, replace=True)
df1["Flag"] = 'Y'
# HERE --v --v
df_merged = pd.merge(df, df1, left_index=True, right_index=True, how='left')
But a more straightforward way is:
df.loc[df.sample(frac=0.05).index, 'Flag'] = 'Y'
With numpy:
df['Flag'] = np.where(df.index.isin(df.sample(frac=0.05).index), 'Y', 'N')