How to create a 1d array enconding the values of a 2D array?

I have an input numpy 2D array:

[
  [2, 1],
  [1, 1],
  [2, 2],
  [2, 2],
  [1, 1],
  [1, 1],
  [2, 1],
  [1, 1],
  [1, 2],
  [1, 2]
]

I would like to create a 1D array assigning a unique (but arbitrary) value to each combination, so something like this:

[            [
  [2, 1], ->   0,
  [1, 1], ->   1,
  [2, 2], ->   2,
  [2, 2], ->   2,
  [1, 1], ->   1,
  [1, 1], ->   2,
  [2, 1], ->   0,
  [1, 1], ->   1,
  [1, 2], ->   3,
  [1, 2]  ->   3,
]            ]

The actual data has millions of rows and unkown possible values, so is there an efficient way to implement this?

>Solution :

You can use np.unique to obtain a set of "IDs" into the unique rows of your array.

Given a 2D input array arr, the line

unique_rows, row_ids = np.unique(arr, axis=0, return_inverse=True)

will assign a 1D array of "unique row IDs" to row_ids, with the corresponding row values in unique_rows.

For your example, we get the following results:

arr = [
    [2, 1],
    [1, 1],
    [2, 2],
    [2, 2],
    [1, 1],
    [1, 1],
    [2, 1],
    [1, 1],
    [1, 2],
    [1, 2]
]

>>> unique_rows, row_ids = np.unique(arr, axis=0, return_inverse=True)

>>> unique_rows
array([[1, 1],
       [1, 2],
       [2, 1],
       [2, 2]])

>>> row_ids
array([2, 0, 3, 3, 0, 0, 2, 0, 1, 1])

Leave a Reply