Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add IDs to dataframe with random Noise

My initial dataframe looks as follows:

import pandas as pd
df = pd.DataFrame({
  "id":[1,1,1,1,2,2],
   "time": [1,2,3,4,5,6],
   "x": [1,2,3,4,9,11 ],
   "y": [5,6,7,8,3,2],
})

So I have two IDs (1 and 2) or two different time series.
Now I want to add some random noise to x- and y-value for each ID and save it as new IDs (with same length) in the initial df:

# Noise
import numpy as np
noise = np.random.normal(0,1,#number of elements you get in array noise)
new_signal = original + noise
# https://stackoverflow.com/questions/14058340/adding-noise-to-a-signal-in-python

So the resulting df would look something like the following (the values are just an example what the resulting output could be):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({
  "id":[1,1,1,1,2,2      ,3,3,3,3,    4,4],
   "time": [1,2,3,4,5,6  ,7,8,9,10,    11,12      ],
   "x": [1,2,3,4,9,11,    1.0005,2.3256,3.1256,4.5647,   9.6514,11.4567 ],
   "y": [5,6,7,8,3,2,  5.0505,6.0276,7.1056,8.5607,   3.6014,2.4567],
})

As you can see: 2 new IDs (3 and 4) have been added and also the values with noise.

Currently I am trying it with different loops but it seems quite complicated. Any suggestions?

Bonus question: How not just duplicate, but adding it by 3 times.

>Solution :

You can reindex and add values to increment the id, time and add noise on the data.

This works for an arbitrary number of repeats:

import numpy as np

N = 3
(df.reindex(np.tile(df.index, N))  # replicate N times the dataframe
   .add(np.c_[np.repeat(np.arange(N), len(df)),         # increment id
              np.repeat(np.arange(N), len(df))*len(df), # increment time
              np.r_[np.zeros((len(df), 2)),             # no noise for first
                    np.random.normal(size=(len(df)*(N-1), 2))] # extra noise
              ])
)

Example with N=3:

    id  time          x         y
0  1.0   1.0   1.000000  5.000000
1  1.0   2.0   2.000000  6.000000
2  1.0   3.0   3.000000  7.000000
3  1.0   4.0   4.000000  8.000000
4  2.0   5.0   9.000000  3.000000
5  2.0   6.0  11.000000  2.000000
0  2.0   7.0   0.651240  4.713942
1  2.0   8.0   1.426533  5.446687
2  2.0   9.0   3.187928  7.430646
3  2.0  10.0   2.998382  9.421992
4  3.0  11.0  10.282871  2.108504
5  3.0  12.0  10.531258  2.439636
0  3.0  13.0  -0.200542  5.286711
1  3.0  14.0   0.350241  8.114173
2  3.0  15.0   1.843902  6.725896
3  3.0  16.0   3.831534  7.964400
4  4.0  17.0   7.612370  2.737872
5  4.0  18.0  12.129517  2.809689
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading