Home Dictionary of dictionaries – Removing duplications in only certain keys/values

Questions

Dictionary of dictionaries – Removing duplications in only certain keys/values

September 7, 2022

I have a dictionary of dictionaries, a sample is below:

my_dictionary = {
    "0": {"Name": "Nick", "Age": 39, "Country": "UK"},
    "1": {"Name": "Steve", "Age": 19, "Country": "Spain"},
    "2": {"Name": "Dave", "Age": 23, "Country": "UK"},
    "3": {"Name": "Nick", "Age": 39, "Country": "Hong Kong"},
    "4": {"Name": "Nick", "Age": 39, "Country": "France"},
}

I want to remove duplicates in my_dictonary if the value in "Name" AND "Age" is the same. It does not matter which one is removed (there could be many that are the same, I only want one version to remain though).

So in our example above, the output would be:

{'0': {'Name': 'Nick', 'Age': 39, 'Country': 'UK'},
 '1': {'Name': 'Steve', 'Age': 19, 'Country': 'Spain'},
 '2': {'Name': 'Dave', 'Age': 23, 'Country': 'UK'}}

As Nick, 39 was duplicated despite having a different country.

Is there an easy/efficient way of doing this? I have several million rows.

>Solution :

Track seen records, for example:

my_dictionary = {
    "0": {"Name": "Nick", "Age": 39, "Country": "UK"},
    "1": {"Name": "Steve", "Age": 19, "Country": "Spain"},
    "2": {"Name": "Dave", "Age": 23, "Country": "UK"},
    "3": {"Name": "Nick", "Age": 39, "Country": "Hong Kong"},
    "4": {"Name": "Nick", "Age": 39, "Country": "France"},
}

seen = set()
result = {}
for k, v in my_dictionary.items():
    if (v['Name'], v['Age']) not in seen:
        result[k] = v
        seen.add((v['Name'], v['Age']))

print(result)

Output:

{
    '0': {'Name': 'Nick', 'Age': 39, 'Country': 'UK'}, 
    '1': {'Name': 'Steve', 'Age': 19, 'Country': 'Spain'}, 
    '2': {'Name': 'Dave', 'Age': 23, 'Country': 'UK'}
}

Edit note: Using set() (which uses a hash-table) for tracking leads to the overall complexity of O(n) for n rows.