How to efficiently filter out duplicate objects in a list based on multiple properties in Python?

I’m working on a Python project where I have a list of custom objects, and I need to filter out duplicates based on multiple properties of these objects. Each object has three properties: id, name, and timestamp. I want to consider an object as a duplicate if both the id and name properties match another object in the list. The timestamp property should not be considered when determining duplicates.

Here’s an example of what the custom object class looks like:

class CustomObject:
    def __init__(self, id, name, timestamp):
        self.id = id
        self.name = name
        self.timestamp = timestamp

And a sample list of objects:

data = [
    CustomObject(1, "Alice", "2023-01-01"),
    CustomObject(2, "Bob", "2023-01-02"),
    CustomObject(1, "Alice", "2023-01-03"),
    CustomObject(3, "Eve", "2023-01-04"),
    CustomObject(2, "Bob", "2023-01-05"),
]

In this case, I want to remove the duplicates and keep the objects with the earliest timestamp.

The expected output should be:

[
    CustomObject(1, "Alice", "2023-01-01"),
    CustomObject(2, "Bob", "2023-01-02"),
    CustomObject(3, "Eve", "2023-01-04"),
]

I know that I can use a loop to compare each object with every other object in the list, but I’m concerned about the performance, especially when the list gets large. Is there a more efficient way to achieve this in Python, possibly using built-in functions or libraries?

>Solution :

You can use a dictionary to keep track of the unique objects based on the id and name properties, and update the timestamp if you find an object with an earlier timestamp. Here’s a solution that should be more efficient than using nested loops:

class CustomObject:
    def __init__(self, id, name, timestamp):
        self.id = id
        self.name = name
        self.timestamp = timestamp

    def __repr__(self):
        return f"CustomObject({self.id}, {self.name}, {self.timestamp})"


data = [
    CustomObject(1, "Alice", "2023-01-01"),
    CustomObject(2, "Bob", "2023-01-02"),
    CustomObject(1, "Alice", "2023-01-03"),
    CustomObject(3, "Eve", "2023-01-04"),
    CustomObject(2, "Bob", "2023-01-05"),
]

unique_objects = {}
for obj in data:
    key = (obj.id, obj.name)
    if key not in unique_objects or obj.timestamp < unique_objects[key].timestamp:
        unique_objects[key] = obj

filtered_data = list(unique_objects.values())

print(filtered_data)
# Output: [CustomObject(1, Alice, 2023-01-01), CustomObject(2, Bob, 2023-01-02), CustomObject(3, Eve, 2023-01-04)]

Leave a Reply