# How to write a loop for calculating average value for recurring entities

I’m having problem with python while constructing a loop (or a nested loop) for my calculation. I have a very large data set (about 600k entries) with repeating ID’s and their values. I’d like to form a loop which starts from the first entry, looks if it is repeating and if it is how manty times it does and calculates average of values for the same ID.

For example in below table I’d like it to return Average a = 2, Average b = 4 and Average c = 1.

I tried a nested for loop below but completely lost my track.

ID Value
a 1
a 3
b 3
b 4
b 5
c 1

for i in range(0,len(ID)):
if ID[i] == ID[i+1]: #if the next ID is equal to the current one
for j in range (i,len(ID)): #look for how many repating values for current ID
if ID[j] != ID[j+1]:
i=j
break
#since it's know how many repeating ID's exist, somehow average them and continue for the second (distinct) ID

### >Solution :

This is a snippet of code that does not use any external modules. You are able to use them if you wish, but thought to leave this as it is still an option.

def avg(lst):
return sum(lst) / len(lst)

filename = "mycsvfile.csv"

averages = dict()
all_values = dict()

with open(filename, "r") as f:

# read all lines and save

split = line.split(",")  # split on comma

row_id = split[0]
try:
# "ID is in the first column Value is at fifth." -OP
value = float(split[4])

# check if there is a new ID or not
if row_id in all_values.keys():
all_values[row_id].append(value)
else:
all_values[row_id] = [value]
except ValueError: