I’m trying to create a dataframe populated by repeating rows based on an existing steady sequence.
For example, if I had a sequence increasing in 3s from 6 to 18, the sequence could be generated using np.arange(6, 18, 3) to give array([ 6, 9, 12, 15]).
How would I go about generating a dataframe in this way?
How could I get the below if I wanted 6 repeated rows?
0 1 2 3
0 6.0 9.0 12.0 15.0
1 6.0 9.0 12.0 15.0
2 6.0 9.0 12.0 15.0
3 6.0 9.0 12.0 15.0
4 6.0 9.0 12.0 15.0
5 6.0 9.0 12.0 15.0
6 6.0 9.0 12.0 15.0
The reason for creating this matrix is that I then wish to add a pd.sequence row-wise to this matrix
>Solution :
Here is a solution using NumPy broadcasting which avoids Python loops, lists, and excessive memory allocation (as done by np.repeat):
pd.DataFrame(np.broadcast_to(np.arange(6, 18, 3), (6, 4)))
To understand why this is more efficient than other solutions, refer to the np.broadcast_to() docs: https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html
more than one element of a broadcasted array may refer to a single memory location.
This means that no matter how many rows you create before passing to Pandas, you’re only really allocating a single row, then a 2D array which refers to the data of that row multiple times.