Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find average of certain values in dictionary for each key in dictionary

I have a dictionary that has a string id as a key and a list of lists as a value. The first element of the list of lists is a book genre and the second element is the book rating.
It looks like this:

{'id1': [['Horror', 4.0], ['Sci-Fi', 9.5], ['Horror', 9.0]],
'id2': [['Thriller', 2.3], ['Horror', 6.2], ['Thriller', 3.9]]}

What I want to do is average out the rating for each genre for each id. So in the end, I want a dictionary that is like this:

{'id1': [['Horror', 6.5], ['Sci-Fi', 9.5]],
'id2': [['Thriller', 3.1], ['Horror', 6.2]]}

What I’ve been trying to do is this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#existing_dictionary is the dictionary above
dict = {}
 for bookAndRate in existing_dictionary.items(): 
      for bookGenrePlusRating in bookAndRate[1]: #bookAndRate prints out [['Comedy', 4.0], ['Comedy', 4.9], ['Adventure', 7.8]] 
            #bookGenrePlusRating prints out ['Comedy', 4.0], then on a separate line, ['Comedy', 4.9], then on a separate line ['Adventure', 7.8]
            if bookGenrePlusRating[0] in dict.values():
                dict[id[0]][1] += bookGenrePlusRating[1] 
            else:
                 dict[id[0]] = [bookGenrePlusRating[0], bookGenrePlusRating[1]]

But this just gives me the last element in each id. So I end up getting

{'id1': ['Horror', 9.0],
'id2': ['Thriller', 3.9]}

>Solution :

Try:

from statistics import mean

dct = {
    "id1": [["Horror", 4.0], ["Sci-Fi", 9.5], ["Horror", 9.0]],
    "id2": [["Thriller", 2.3], ["Horror", 6.2], ["Thriller", 3.9]],
}

out = {}
for k, v in dct.items():
    for genre, rating in v:
        out.setdefault(k, {}).setdefault(genre, []).append(rating)

out = {k: [[kk, mean(vv)] for kk, vv in v.items()] for k, v in out.items()}
print(out)

Prints:

{
    "id1": [["Horror", 6.5], ["Sci-Fi", 9.5]],
    "id2": [["Thriller", 3.0999999999999996], ["Horror", 6.2]],
}

If you want to round the floats:

out = {
    k: [[kk, round(mean(vv), 2)] for kk, vv in v.items()]
    for k, v in out.items()
}
print(out)

Prints:

{
    "id1": [["Horror", 6.5], ["Sci-Fi", 9.5]],
    "id2": [["Thriller", 3.1], ["Horror", 6.2]],
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading