I have a dataframe with in column "A" locations and in column "B" values. Locations occure multiple times in this DataFrame, now i’d like to add a third column in which i store the average value of column "B" that have the same location value in column "A".
-I know the .mean() can be used to get an average
-I know how to filter with .loc()
I could make a list of all unique values in column A, and compute the average for all of them by making a for loop. Hover, this seems combersome to me. Any idea how this can be done more efficiently?
>Solution :
Sounds like what you need is GroupBy. Take a look here
Given
df = pd.DataFrame({'A': [1, 1, 2, 1, 2],
'B': [np.nan, 2, 3, 4, 5],
'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])
You can use
df.groupby('A').mean()
to group the values based on the common values in column "A" and find the mean.
Output:
B C
A
1 3.0 1.333333
2 4.0 1.500000