Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to group all links in dictionary by using first occurance of key as a starting point?

I have a dictionary like this,

Note: If something is unclear, please do let me know I’ll try to update the question as per your request.

matches = {'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7'],
 'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku'],
 'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.'],
 'Fukuoka Bldg. 8-7 Yaesu Chome': ['2-8-7 Yaesu, Chuo-Ku',
                                   'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
 'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building 9Th Floor'],
 'Fukuoka Building 9Th Floor': ['Fukuoka Bldg. 9Th Fl',
                                'Fukuoka Building, 9Th -10Th Flr.']}

I want to group them together by finding links (with keys or values), key can be anything (or) just the first key you come across that is the starting point.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is the desired output I am looking forward to,

{'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7',
                          '2-8-7 Yaesu, Chuo-Ku',
                          'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
                          'Fukuoka Bldg. 8-7 Yaesu Chome'],
 'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.',
                             'Fukuoka Bldg. 9Th Fl',
                             'Fukuoka Building 9Th Floor',
                             'Fukuoka Bldg. 9Th Fl']}

I have tried this,

unique_lst = set()
merged_matches = dict()
for key, values in matches.items():
    if key not in unique_lst:
        values_lst = []
        for v in values:
            output = matches.get(v)
            for subkeys, subvals in matches.items():
                if key != subkeys and v != subkeys:
                    keyvals = [subkeys] + list(subvals)
                    if v in keyvals:
                        values_lst.extend(keyvals)
            if output:
                values_lst.extend(output)
            values_lst.append(v)

        values_lst = [i for i in values_lst if i != key]
        values_lst = values_lst + [key]
        for v in values_lst:
            unique_lst.add(v)
            
        merged_matches[key] = values_lst

Here’s the output I got,

# print(merged_matches)

{'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building, 9Th -10Th Flr.',
  'Fukuoka Building 9Th Floor',
  'Fukuoka Bldg. 9Th Fl'],
 'Fukuoka Bldg. 8-7 Yaesu Chome': ['Chuo Ward, Yaesu 2-8-7',
  '2-8-7 Yaesu, Chuo-Ku',
  'Chuo Ward, Yaesu 2-8-7',
  '2-8-7 Yaesu, Chuo-Ku',
  'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
  'Fukuoka Bldg. 8-7 Yaesu Chome'],
 'Fukuoka Bldg 10Th Floor': ['Fukuoka Building 9Th Floor',
  'Fukuoka Bldg. 9Th Fl',
  'Fukuoka Building, 9Th -10Th Flr.',
  'Fukuoka Building, 9Th -10Th Flr.',
  'Fukuoka Bldg 10Th Floor']}

>Solution :

IMO, the problem boils down to finding the connected components of a graph induced by the dictionary. One way you could do so is using the UnionFind datastructure to get the list of disjoint sets constructed from the keys and values.

Then we could construct a dictionary from the merged sets by selecting one element as key and the remainder as values.

from networkx.utils.union_find import UnionFind
c = UnionFind()
for k, lst in matches.items():
    c.union(*[k, *lst])

out = {k: v for k, *v in map(list, c.to_sets())}

Output:

{'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku',
  'Fukuoka Bldg. 8-7 Yaesu Chome',
  'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
 'Fukuoka Building 9Th Floor': ['Fukuoka Bldg 10Th Floor',
  'Fukuoka Bldg. 9Th Fl',
  'Fukuoka Building, 9Th -10Th Flr.']}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading