When you ask Numpy to make an array out of a collection including arbitrary objects, it will create an array of "object" type, which allows you to use index slicing across those objects, but since the object itself is unknown to numpy, you cannot index into the object in one go (even if that particular object is actually a numpy array).
However, if you slice into the object array to select the parts of the object array that are actually numpy arrays, it seems that numpy won’t collapse that slice into a single numpy array, even with another call to np.array()
. Here is a little example of what I mean:
>>> aa = np.array([np.random.randn(3, 4), {'something': 'blah'}], dtype=object)
>>> aa.shape
(2,)
>>> np.array(aa[0:1])
array([array([[ 1.78237043, -0.61082005, 0.92160137, 0.58961677],
[ 1.54183639, -0.43097464, 1.36213935, -1.2695875 ],
[ 0.01431181, -0.62073519, 0.56267489, -0.46113538]])],
dtype=object)
>>> np.array(aa[0:1]).shape # I want this to be (1, 3, 4)
(1,)
Is there any way to do this without a double copy (e.g. not like this: np.array(aa[0:1].tolist())
)? Does an object array even allow you to do this without such a copy?
>Solution :
You can use np.stack
to combine the object-type array to a normal ndarray
:
>>> aa = np.array([np.random.randn(3, 4), {'something': 'blah'}], dtype=object)
>>> aa
array([array([[-6.36267204e-01, 8.95707498e-02, 1.09275216e+00,
-3.70594544e-01],
[ 8.32865823e-01, -6.53876690e-01, 1.21000457e+00,
1.22046398e+00],
[-5.30262118e-01, 1.17934947e-04, 4.45156002e-01,
-6.61549444e-02]]) ,
{'something': 'blah'}], dtype=object)
>>> np.stack(aa[0:1])
array([[[-6.36267204e-01, 8.95707498e-02, 1.09275216e+00,
-3.70594544e-01],
[ 8.32865823e-01, -6.53876690e-01, 1.21000457e+00,
1.22046398e+00],
[-5.30262118e-01, 1.17934947e-04, 4.45156002e-01,
-6.61549444e-02]]])
>>> np.stack(aa[0:1]).shape
(1, 3, 4)
This also works with multiple ndarrays in your object-array, as long as they have compatible sizes.
Internally, this just treats the object-array as a sequence and iterates over it. I’m not sure if it has a significant performance benefit over your solution with np.array(aa[0:1].tolist())
.