I have a dictionary like this,
Note: If something is unclear, please do let me know I’ll try to update the question as per your request.
matches = {'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7'],
'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.'],
'Fukuoka Bldg. 8-7 Yaesu Chome': ['2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building 9Th Floor'],
'Fukuoka Building 9Th Floor': ['Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.']}
I want to group them together by finding links (with keys or values), key can be anything (or) just the first key you come across that is the starting point.
This is the desired output I am looking forward to,
{'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7',
'2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
'Fukuoka Bldg. 8-7 Yaesu Chome'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Bldg. 9Th Fl',
'Fukuoka Building 9Th Floor',
'Fukuoka Bldg. 9Th Fl']}
I have tried this,
unique_lst = set()
merged_matches = dict()
for key, values in matches.items():
if key not in unique_lst:
values_lst = []
for v in values:
output = matches.get(v)
for subkeys, subvals in matches.items():
if key != subkeys and v != subkeys:
keyvals = [subkeys] + list(subvals)
if v in keyvals:
values_lst.extend(keyvals)
if output:
values_lst.extend(output)
values_lst.append(v)
values_lst = [i for i in values_lst if i != key]
values_lst = values_lst + [key]
for v in values_lst:
unique_lst.add(v)
merged_matches[key] = values_lst
Here’s the output I got,
# print(merged_matches)
{'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Building 9Th Floor',
'Fukuoka Bldg. 9Th Fl'],
'Fukuoka Bldg. 8-7 Yaesu Chome': ['Chuo Ward, Yaesu 2-8-7',
'2-8-7 Yaesu, Chuo-Ku',
'Chuo Ward, Yaesu 2-8-7',
'2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
'Fukuoka Bldg. 8-7 Yaesu Chome'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building 9Th Floor',
'Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Bldg 10Th Floor']}
>Solution :
IMO, the problem boils down to finding the connected components of a graph induced by the dictionary. One way you could do so is using the UnionFind datastructure to get the list of disjoint sets constructed from the keys and values.
Then we could construct a dictionary from the merged sets by selecting one element as key and the remainder as values.
from networkx.utils.union_find import UnionFind
c = UnionFind()
for k, lst in matches.items():
c.union(*[k, *lst])
out = {k: v for k, *v in map(list, c.to_sets())}
Output:
{'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Bldg. 8-7 Yaesu Chome',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
'Fukuoka Building 9Th Floor': ['Fukuoka Bldg 10Th Floor',
'Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.']}