import pandas as pd
import numpy as np
s = [ "S" + str(i) for i in range(1,101)]
c = [ "C" + str(i) for i in range(1,51)]
arr1 = np.random.randn(len(c),len(s))
arr2 = np.random.randn(len(c),len(s))
How to create and fill pandas dataframe df with 100 * 50 = 5000 rows for each possible s and c pairs
such that arr1_col has arr1[s,c] and
arr2_col has arr2[s,c]?
df = pd.DataFrame({'S':s, 'C':c, 'arr1_col':arr1[s,c] , 'arr2_col':arr2[s,c]})
>Solution :
Assuming this 4×3 reproducible input:
import pandas as pd
import numpy as np
s = [ "S" + str(i) for i in range(1,4+1)]
c = [ "C" + str(i) for i in range(1,3+1)]
arr1 = np.arange(len(c)*len(s)).reshape(len(s), len(c))
arr2 = np.arange(len(c)*len(s)).reshape(len(s), len(c))*10
I imagine that you want to use numpy.repeat, numpy.tile and numpy.ravel:
df = pd.DataFrame({'S': np.repeat(s, len(c)),
'C': np.tile(c, len(s)),
'arr1': arr1.ravel(),
'arr2': arr2.ravel(),
})
df = (pd.concat({'arr1': pd.DataFrame(arr1, index=s, columns=c),
'arr2': pd.DataFrame(arr2, index=s, columns=c),},
axis=1)
.stack().rename_axis(['S', 'C']).reset_index()
)
Output:
S C arr1 arr2
0 S1 C1 0 0
1 S1 C2 1 10
2 S1 C3 2 20
3 S2 C1 3 30
4 S2 C2 4 40
5 S2 C3 5 50
6 S3 C1 6 60
7 S3 C2 7 70
8 S3 C3 8 80
9 S4 C1 9 90
10 S4 C2 10 100
11 S4 C3 11 110
ordering C first
df = pd.DataFrame({'S': np.tile(s, len(c)),
'C': np.repeat(c, len(s)),
'arr1': arr1.ravel(order='F'),
'arr2': arr2.ravel(order='F'),
})
Or:
df = (pd.concat({'arr1': pd.DataFrame(arr1, index=s, columns=c),
'arr2': pd.DataFrame(arr2, index=s, columns=c),},
axis=0)
.T.stack().rename_axis(['C', 'S']).reset_index()
)
Output:
S C arr1 arr2
0 S1 C1 0 0
1 S2 C1 3 30
2 S3 C1 6 60
3 S4 C1 9 90
4 S1 C2 1 10
5 S2 C2 4 40
6 S3 C2 7 70
7 S4 C2 10 100
8 S1 C3 2 20
9 S2 C3 5 50
10 S3 C3 8 80
11 S4 C3 11 110