How to write a loop for calculating average value for recurring entities

I’m having problem with python while constructing a loop (or a nested loop) for my calculation. I have a very large data set (about 600k entries) with repeating ID’s and their values. I’d like to form a loop which starts from the first entry, looks if it is repeating and if it is how manty times it does and calculates average of values for the same ID.

For example in below table I’d like it to return Average a = 2, Average b = 4 and Average c = 1.

I tried a nested for loop below but completely lost my track.

ID Value
a 1
a 3
b 3
b 4
b 5
c 1

for i in range(0,len(ID)):
     if ID[i] == ID[i+1]: #if the next ID is equal to the current one
         for j in range (i,len(ID)): #look for how many repating values for current ID
                    if ID[j] != ID[j+1]:
                        i=j
                        break
          #since it's know how many repeating ID's exist, somehow average them and continue for the second (distinct) ID

>Solution :

This is a snippet of code that does not use any external modules. You are able to use them if you wish, but thought to leave this as it is still an option.

def avg(lst):
    return sum(lst) / len(lst)

filename = "mycsvfile.csv"

averages = dict()
all_values = dict()

with open(filename, "r") as f:
    
    # read all lines and save 
    for line in f.readlines():
        
        split = line.split(",")  # split on comma
        
        row_id = split[0]
        try:
            # "ID is in the first column Value is at fifth." -OP
            value = float(split[4])
            
            # check if there is a new ID or not
            if row_id in all_values.keys():
                all_values[row_id].append(value)
            else:
                all_values[row_id] = [value]
        except ValueError:
            pass  # ignore header lines
        
    # get averages into dictionary
    for k, v in all_values.items():
        averages[k] = avg(v)
    
    # print the data to show it is correct
    for k, v in averages.items():
        print(f"{k}: {v}")

This snippet of code reads in all data from a CSV file and then gets the averages into a dictionary. Key is the ID, value is the average.

Leave a Reply