Python Pandas – How to create a dataframe from a sequence

September 17, 2022

I’m trying to create a dataframe populated by repeating rows based on an existing steady sequence.
For example, if I had a sequence increasing in 3s from 6 to 18, the sequence could be generated using np.arange(6, 18, 3) to give array([ 6, 9, 12, 15]).

How would I go about generating a dataframe in this way?

How could I get the below if I wanted 6 repeated rows?

     0   1   2    3
0   6.0 9.0 12.0 15.0   
1   6.0 9.0 12.0 15.0   
2   6.0 9.0 12.0 15.0   
3   6.0 9.0 12.0 15.0   
4   6.0 9.0 12.0 15.0
5   6.0 9.0 12.0 15.0
6   6.0 9.0 12.0 15.0

The reason for creating this matrix is that I then wish to add a pd.sequence row-wise to this matrix

>Solution :

Here is a solution using NumPy broadcasting which avoids Python loops, lists, and excessive memory allocation (as done by np.repeat):

pd.DataFrame(np.broadcast_to(np.arange(6, 18, 3), (6, 4)))

To understand why this is more efficient than other solutions, refer to the np.broadcast_to() docs: https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html

more than one element of a broadcasted array may refer to a single memory location.

This means that no matter how many rows you create before passing to Pandas, you’re only really allocating a single row, then a 2D array which refers to the data of that row multiple times.