With numpy how to search on 2 columns and get the third value in a 3 columns array?

I am building a translation table to migrate some data from many databases to one. To do so I want a 3-columns array with numpy for memory efficiency. the goal is to translate (obj_id, db_id) into a pk.
To make a test I created such an array :

a=np.array(
    [(i*2,i % 10,i*3) for i in range(1_000_000)], 
    dtype=[('obj_id',np.int32),('db_id',np.int8),('pk',np.int32)]
)

a array looks like this :

array([(      0, 0,       0), (      2, 1,       3),
       (      4, 2,       6), ..., (1999994, 7, 2999991),
       (1999996, 8, 2999994), (1999998, 9, 2999997)],
      dtype=[('obj_id', '<i4'), ('db_id', 'i1'), ('pk', '<i4')])

Now I would like to translate (1999994, 7) into 2999991

In very un-optimized python, I would do :

for rec in a:
   if (rec[0], rec[1]) == (1999994, 7):
     print(rec[2])
     break

How can I do that using numpy only ?

>Solution :

You have a structured array, you can slice it with:

a['pk'][(a['obj_id'] == 1999994) & (a['db_id'] == 7)]

Output: array([2999991], dtype=int32)

Leave a Reply