Group by columns under conditions to calculate average

Advertisements

I have a raw dataset like below:

ColA ColB duration interval Counter
A SD 2 4 1
A SD 3 3 2
A UD 2 1 10
B UD 1 2 2
B UD 2 2 2
B SD 3 3 13
B SD 1 4 19

I am expecting an output result like below:

Explanation of the output:

  1. SumCounter is the sum of counter values on the group by ColA and ColB values.
  2. AvgdurationSD/UD and AvgIntervalSD/UD are created by taking the average over ColA and ColB and having a 0 value incase the columns dont match the criteria (e.g. AvgDurationSD and AvIntervalSD has 0 value for a group of ColA = A and ColB = UD.

I understand that I have to use group by and agg functions to apply here but I am not really sure how to apply conditions for ColB on individual new columns.

Any help is appreciated:)

>Solution :

Use DataFrame.pivot_table with helper column new by copy like ColB, then flatten MultiIndex and add ouput to new DataFrame created by aggregate sum:

df1 = (df.assign(new=df['ColB'])
         .pivot_table(index=['ColA', 'ColB'], 
                      columns='new', 
                      values=['interval','duration'], 
                      fill_value=0,
                      aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
        .sum()
        .to_frame(name='SumCounter')
        .join(df1).reset_index())
print (df)
  ColA ColB  SumCounter  durationSD  durationUD  intervalSD  intervalUD
0    A   SD           3         2.5         0.0         3.5           0
1    A   UD          10         0.0         2.0         0.0           1
2    B   SD          32         2.0         0.0         3.5           0
3    B   UD           4         0.0         1.5         0.0           2

Leave a ReplyCancel reply