Home Need to remove (and partiallly merge) nearly duplicate items from list of dictionaries

Questions

Need to remove (and partiallly merge) nearly duplicate items from list of dictionaries

December 5, 2022

I have a list of dictionaries in this form: (example) [{name: aa, year: 2022}, {name: aa, year: 2021}, {name: bb, year: 2016}, {name: cc, year: 2015}]. What i need is to remove the items where the name is the same, but make a list where the years are added together (every year can be in a list, for my purposes, this doesn’t matter). So the example list of dictionaries would look like this: [{name: aa, year: [2022, 2021}, {name: bb, year: [2016]}, {name: cc, year: [2015]}]. My current code looks like this.

def read_csv_file(self, path):
    book_list = []
    with open(path) as f:
        read_dict = csv.DictReader(f)
        for i in read_dict:
            book_list.append(i)
           

    bestsellers = []
    for i in list_of_books:
        seen_books = []
        years_list = []
        if i["Name"] not in seen_books:
            years_list.append(i["Year"])
            seen_books.append(i)
        else:
            years_list.append(i["Year"])

        if i['Genre'] == 'Non Fiction':
            bestsellers.append(FictionBook(i["Name"], i["Author"], float(i["User Rating"]), int(i["Reviews"]), float(i["Price"]), years_list, i["Genre"]))
        else:
            bestsellers.append(NonFictionBook(i["Name"], i["Author"], float(i["User Rating"]), int(i["Reviews"]), float(i["Price"]), years_list, i["Genre"]))
    for i in bestseller:
        print(i.title)

Ultimately my code needs to extract data from a csv file and then create instances of the class Fictionbook or Nonfictionbook depending on the genre. I think i have the CSV file and making the books finished, i just need to filter the near-duplicate dictionaries and merge them in the lists of years if that makes sense. If anything is unclear please let me know, so i can explain further.

>Solution :

Use dict.setdefault() to create a list if the key has not yet been seen:

lod=[{'name': 'aa', 'year': 2022}, {'name': 'aa', 'year': 2021}, {'name': 'bb', 'year': 2016}, {'name': 'cc', 'year': 2015}]

result={}
for d in lod:
    result.setdefault(d['name'], []).append(d['year'])

>>> result
{'aa': [2022, 2021], 'bb': [2016], 'cc': [2015]}

Then put the list back together:

>>> [{'name': n, 'year': v} for n,v in result.items()]
[{'name': 'aa', 'year': [2022, 2021]}, {'name': 'bb', 'year': [2016]}, {'name': 'cc', 'year': [2015]}]