Let’s say I have a dictionary with the following contents:
old_dict = {'a':[0,1,2], 'b':[1,2,3]}
and I want to obtain a new dictionary where the keys are the values in the old dictionary, and the new values are the keys from the old dictionary, i.e.:
new_dict = {0:['a'], 1:['a','b'], 2:['a','b'], 3:['b']}
To perform this task, I’m currently using the following example code:
# get all the keys for the new dictionary
new_keys = np.unique(np.hstack([old_dict[key] for key in old_dict]))
# initialize new dictionary
new_dict = {key: [] for key in new_keys}
# step through every new key
for new_key in new_keys:
# step through every old key and check if the new key the current list of values
for old_key in old_dict:
if new_key in old_dict[old_key]:
new_dict[new_key].append(old_key)
In this example I’m showing 2 old keys and 4 new keys, but for my problem I have ~10,000 old keys and ~100,000 new keys. Is there a smarter way to perform my task, maybe with some tree-based algorithm? I used dictionaries because they are easier for me to visualize the problem, but dictionaries can be necessary if there are better data types for this exercise.
In the meantime, I’m looking into documentations for reverse lookup of dictionaries, and trying to manipulate this using sindex from geopandas.
>Solution :
You can try:
old_dict = {'a':[0,1,2], 'b':[1,2,3]}
new_dict = {}
for k, v in old_dict.items():
for i in v:
new_dict.setdefault(i, []).append(k)
print(new_dict)
Prints:
{0: ['a'], 1: ['a', 'b'], 2: ['a', 'b'], 3: ['b']}
Benchmark:
import numpy as np
from timeit import timeit
old_dict = {'a':[0,1,2], 'b':[1,2,3]}
def f1():
new_dict = {}
for k, v in old_dict.items():
for i in v:
new_dict.setdefault(i, []).append(k)
return new_dict
def f2():
# get all the keys for the new dictionary
new_keys = np.unique(np.hstack([old_dict[key] for key in old_dict]))
# initialize new dictionary
new_dict = {key: [] for key in new_keys}
# step through every new key
for new_key in new_keys:
# step through every old key and check if the new key the current list of values
for old_key in old_dict:
if new_key in old_dict[old_key]:
new_dict[new_key].append(old_key)
return new_dict
t1 = timeit('f1()', number=1000, globals=globals())
t2 = timeit('f2()', number=1000, globals=globals())
print(t1)
print(t2)
Prints:
0.0005186359921935946
0.009738252992974594
With old_dict
initialized with (dict has now 10648
items):
from itertools import product
from random import randint
k = 'abcdefghijkloprstuvwyz'
old_dict = {''.join(c): list(range(randint(1, 3), randint(4, 10))) for c in product(k, k, k)}
print(len(old_dict))
Prints:
10648
3.126827526008128
19.222182962010265