Pythonic way of joining lists of dictionaries on a key

January 6, 2023

Suppose I have two lists of dictionaries, l1 and l2.

l1 = [
    { "id": 0, "foo": 0 },
    { "id": 1, "foo": 1 },
    { "id": 2, "foo": 2 },
    ...
]

l2 = [
    { "id": 0, "bar": 0 },
    { "id": 1, "bar": 1 },
    { "id": 2, "bar": 2 },
    ...
]

Is there a Pythonic way of joining the two lists together on a key, say "id"?

Expected output:

[
    { "id": 0, "foo": 0, "bar": 0 },
    { "id": 1, "foo": 1, "bar": 1 },
    { "id": 2, "foo": 2, "bar": 2 },
    ...
]

This can be achieved with comprehension, but it inefficiently runs in O(NM), and creates a duplicate key-value pair if the key of l1 and l2 are different.

[
    {**d1, **d2}
    for d1 in l1 for d2 in l2
    if d1["id"] == d2["id"]
]

Alternatively, without considering readability, one could solve it more time-efficiently by:

# Create a mapping from the key of d1 to d1.
# This dictionary will combine the entries of d1 and d2.
d = { d1["id"]: d1 for d1 in l1 }

# Insert d2 entries into their corresponding dictionaries.
for d2 in l2:
    key = d2["id"]
    d[key].update({
        k: v
        for (k, v) in d2.items()
        if k != "id"
    })

# Convert the dictionary back into a list of dictionaries.
result = list(d.values())

Is there a better solution?

>Solution :

"Pythonic" doesn’t mean "use list comprehensions instead of for loops". For-loops are very pythonic. Just use an intermediate dict as an index. Use the .setdefault grouping idiom. Use itertools to create convenient iterators that keep your code clean:

import itertools

index = {}

for d in itertools.chain(l1, l2):
    index.setdefault(d['id'], {}).update(d)

result = list(index.values())

Potentially, you could consider using a defaultdict instead of a plain dict with .setdefault (in this case, I probably would since the defaultdict would just be an intermediate data structure):

import itertools
import collections

index = collections.defaultdict(dict)

for d in itertools.chain(l1, l2):
    index[d["id"]].update(d)

result = list(index.values())