I’m working on a Python project where I have a list of custom objects, and I need to filter out duplicates based on multiple properties of these objects. Each object has three properties: id
, name
, and timestamp
. I want to consider an object as a duplicate if both the id
and name
properties match another object in the list. The timestamp
property should not be considered when determining duplicates.
Here’s an example of what the custom object class looks like:
class CustomObject:
def __init__(self, id, name, timestamp):
self.id = id
self.name = name
self.timestamp = timestamp
And a sample list of objects:
data = [
CustomObject(1, "Alice", "2023-01-01"),
CustomObject(2, "Bob", "2023-01-02"),
CustomObject(1, "Alice", "2023-01-03"),
CustomObject(3, "Eve", "2023-01-04"),
CustomObject(2, "Bob", "2023-01-05"),
]
In this case, I want to remove the duplicates and keep the objects with the earliest timestamp
.
The expected output should be:
[
CustomObject(1, "Alice", "2023-01-01"),
CustomObject(2, "Bob", "2023-01-02"),
CustomObject(3, "Eve", "2023-01-04"),
]
I know that I can use a loop to compare each object with every other object in the list, but I’m concerned about the performance, especially when the list gets large. Is there a more efficient way to achieve this in Python, possibly using built-in functions or libraries?
>Solution :
You can use a dictionary to keep track of the unique objects based on the id
and name
properties, and update the timestamp
if you find an object with an earlier timestamp
. Here’s a solution that should be more efficient than using nested loops:
class CustomObject:
def __init__(self, id, name, timestamp):
self.id = id
self.name = name
self.timestamp = timestamp
def __repr__(self):
return f"CustomObject({self.id}, {self.name}, {self.timestamp})"
data = [
CustomObject(1, "Alice", "2023-01-01"),
CustomObject(2, "Bob", "2023-01-02"),
CustomObject(1, "Alice", "2023-01-03"),
CustomObject(3, "Eve", "2023-01-04"),
CustomObject(2, "Bob", "2023-01-05"),
]
unique_objects = {}
for obj in data:
key = (obj.id, obj.name)
if key not in unique_objects or obj.timestamp < unique_objects[key].timestamp:
unique_objects[key] = obj
filtered_data = list(unique_objects.values())
print(filtered_data)
# Output: [CustomObject(1, Alice, 2023-01-01), CustomObject(2, Bob, 2023-01-02), CustomObject(3, Eve, 2023-01-04)]