Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to merge items based on the same key/value in a python list

I’m looking for a way to merge objects where one or mores keys have the same value.
Specific in my example I have a list where the category and code must match.

Input

[{
    "category": "Nace2008",
    "code": "01110",
    "NL": "Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden"
},
{
    "category": "Nace2008",
    "code": "01110",
    "FR": "Culture de c\u00e9r\u00e9ales (\u00e0 l'exception du riz), de l\u00e9gumineuses et de graines ol\u00e9agineuses"
},
{
    "category": "Nace2008",
    "code": "01120",
    "FR": "Culture du riz"
},
{
    "category": "Nace2008",
    "code": "01120",
    "NL": "Teelt van rijst"
}]

Expected output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[{
    "category": "Nace2008",
    "code": "01110",
    "NL": "Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden",
    "FR": "Culture de c\u00e9r\u00e9ales (\u00e0 l'exception du riz), de l\u00e9gumineuses et de graines ol\u00e9agineuses"
},
{
    "category": "Nace2008",
    "code": "01120",
    "NL": "Teelt van rijst",
    "FR": "Culture du riz"
}]

Looping through the list and do another loop to check for the same category and code will result in duplicate data.

>Solution :

So, you just want the standard dictionary grouping idiom based on the key you described:

>>> data = [{
...     "category": "Nace2008",
...     "code": "01110",
...     "NL": "Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden"
... },
... {
...     "category": "Nace2008",
...     "code": "01110",
...     "FR": "Culture de c\u00e9r\u00e9ales (\u00e0 l'exception du riz), de l\u00e9gumineuses et de graines ol\u00e9agineuses"
... },
... {
...     "category": "Nace2008",
...     "code": "01120",
...     "FR": "Culture du riz"
... },
... {
...     "category": "Nace2008",
...     "code": "01120",
...     "NL": "Teelt van rijst"
... }]

So create an empty dictionary, group by the key:

>>> result = {}
>>> for d in data:
...     key = d['category'], d['code']
...     result.setdefault(key, {}).update(d)
...

Note, the .update just merges whatever is there naively. If you would have duplicate keys in subsequent records, then it would take the last one. If they are all unique, it shouldn’t be a problem. And the results:

>>> from pprint import pprint
>>> pprint(result)
{('Nace2008', '01110'): {'FR': "Culture de céréales (à l'exception du riz), de "
                               'légumineuses et de graines oléagineuses',
                         'NL': 'Teelt van granen (m.u.v. rijst), peulgewassen '
                               'en oliehoudende zaden',
                         'category': 'Nace2008',
                         'code': '01110'},
 ('Nace2008', '01120'): {'FR': 'Culture du riz',
                         'NL': 'Teelt van rijst',
                         'category': 'Nace2008',
                         'code': '01120'}}

Then you can extract the values of that dictionary if you want just that:

>>> pprint(list(result.values()))
[{'FR': "Culture de céréales (à l'exception du riz), de légumineuses et de "
        'graines oléagineuses',
  'NL': 'Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden',
  'category': 'Nace2008',
  'code': '01110'},
 {'FR': 'Culture du riz',
  'NL': 'Teelt van rijst',
  'category': 'Nace2008',
  'code': '01120'}]

Note, the grouping idiom can be cleaned up a bit using defaultdict (some people find .setdefault confusing):

from collections import defaultdict
result = defaultdict(dict)
for d in data:
    key = d['category'], d['code']
    result[key].update(d)

Both are the same as:

result = {}
for d in data:
    key = d['category'], d['code']
    if key not in result:
        result[key] = {}
    result[key].update(d)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading