I have an input numpy 2D array:
[
[2, 1],
[1, 1],
[2, 2],
[2, 2],
[1, 1],
[1, 1],
[2, 1],
[1, 1],
[1, 2],
[1, 2]
]
I would like to create a 1D array assigning a unique (but arbitrary) value to each combination, so something like this:
[ [
[2, 1], -> 0,
[1, 1], -> 1,
[2, 2], -> 2,
[2, 2], -> 2,
[1, 1], -> 1,
[1, 1], -> 2,
[2, 1], -> 0,
[1, 1], -> 1,
[1, 2], -> 3,
[1, 2] -> 3,
] ]
The actual data has millions of rows and unkown possible values, so is there an efficient way to implement this?
>Solution :
You can use np.unique
to obtain a set of "IDs" into the unique rows of your array.
Given a 2D input array arr
, the line
unique_rows, row_ids = np.unique(arr, axis=0, return_inverse=True)
will assign a 1D array of "unique row IDs" to row_ids
, with the corresponding row values in unique_rows
.
For your example, we get the following results:
arr = [
[2, 1],
[1, 1],
[2, 2],
[2, 2],
[1, 1],
[1, 1],
[2, 1],
[1, 1],
[1, 2],
[1, 2]
]
>>> unique_rows, row_ids = np.unique(arr, axis=0, return_inverse=True)
>>> unique_rows
array([[1, 1],
[1, 2],
[2, 1],
[2, 2]])
>>> row_ids
array([2, 0, 3, 3, 0, 0, 2, 0, 1, 1])