I am building a translation table to migrate some data from many databases to one. To do so I want a 3-columns array with numpy for memory efficiency. the goal is to translate (obj_id, db_id) into a pk.
To make a test I created such an array :
a=np.array(
[(i*2,i % 10,i*3) for i in range(1_000_000)],
dtype=[('obj_id',np.int32),('db_id',np.int8),('pk',np.int32)]
)
a
array looks like this :
array([( 0, 0, 0), ( 2, 1, 3),
( 4, 2, 6), ..., (1999994, 7, 2999991),
(1999996, 8, 2999994), (1999998, 9, 2999997)],
dtype=[('obj_id', '<i4'), ('db_id', 'i1'), ('pk', '<i4')])
Now I would like to translate (1999994, 7)
into 2999991
In very un-optimized python, I would do :
for rec in a:
if (rec[0], rec[1]) == (1999994, 7):
print(rec[2])
break
How can I do that using numpy only ?
>Solution :
You have a structured array, you can slice it with:
a['pk'][(a['obj_id'] == 1999994) & (a['db_id'] == 7)]
Output: array([2999991], dtype=int32)