Create nested array for all unique indices in a pandas MultiIndex DataFrame

October 14, 2022

generate dummy data

np.random.seed(42)
df = pd.DataFrame({'subject': ['A'] * 10 + ['B'] * 10,
                   'trial': list(range(5)) * 4,
                   'value1': np.random.randint(0, 100, 20),
                   'value2': np.random.randint(0, 100, 20)
                   })
df = df.set_index(['subject', 'trial']).sort_index()
print(df)


               value1  value2
subject trial                
A       0          51       1
        0          20      75
        1          92      63
        1          82      57
        2          14      59
        2          86      21
        3          71      20
        3          74      88
        4          60      32
        4          74      48
B       0          87      90
        0          52      79
        1          99      58
        1           1      14
        2          23      41
        2          87      61
        3           2      91
        3          29      61
        4          21      59
        4          37      46

Notice: Each subject / trial combination has multiple rows.

I want to create a array with the rows as nested dimensions.

My (as I find ugly) data transformation via list

tmp=list()
for idx in df.index.unique():
   tmp.append(df.loc[idx].to_numpy())
goal = np.array(tmp)
print(goal)


[[[51  1]
  [20 75]]

...

 [[21 59]
  [37 46]]]

Can you show me a native pandas / numpy way to do it (without the list crutch)?

>Solution :

To be able to generate a non-ragged numpy array, the number of duplicates must be equal for all values. Thus you don’t have to loop over them. Just find out the number and reshape

n = len(df)/(~df.index.duplicated()).sum()

assert n.is_integer()

out = df.to_numpy().reshape(-1, df.shape[1], int(n))

Output:

array([[[51,  1],
        [20, 75]],

       [[92, 63],
        [82, 57]],

       [[14, 59],
        [86, 21]],

       [[71, 20],
        [74, 88]],

       [[60, 32],
        [74, 48]],

       [[87, 90],
        [52, 79]],

       [[99, 58],
        [ 1, 14]],

       [[23, 41],
        [87, 61]],

       [[ 2, 91],
        [29, 61]],

       [[21, 59],
        [37, 46]]])