I have a dataframe that looks something like
time value1 value2
1 1000000 1000009842 1009809435
2 1000032 2348974923 2343242342
3 1000342 2342345320 2342342234
...
1000 4324342 2131242353 4234234234
I want to get 20 random values where the index are spaced uniformly, indexes
10, 20, 30, 40, 50... 200
or
400, 420, 440, 460... 800
Where the index starts from can be random, the only thing that needs to be constant is the index between each returned column.
I’ve used
df.sample(1000)
to get a sample of 1000 columns but don’t see a way of distributing the indexes equally?
>Solution :
Use df.iloc[slice_idx] for this, with slice_idx an array that starts at a random start index and has a constant index width.
E.g.:
width = 10
idx0 = np.random.randint(0, len(df))
slice_idx = np.arange(idx0 , len(df), width)
df.iloc[slice_idx]
returns the rows idx0 , idx0+10, idx0+20, idx0+30, idx0+40, idx0+50, ...
A thing to consider is the minimal length of the array. This can be assured by selecting idx0 below a certain limit. E.g.:
width = 20
min_elements = 10 # minimal number of selected elements
assert len(df) > min_elements * width # assure the parameters validity
# select idx0 between 0 and len(df) - (min_elements - 1) * width - 1
idx0 = np.random.randint(0, len(df) - (min_elements - 1) * width)
slice_idx = np.arange(idx0 , len(df), width)
df.iloc[slice_idx] # has at least `min_elements` items