Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Trying to aggregate with Pandas

print(ice_cream_sales_df.head()); print(' ')

sorted_by_flavor = ice_cream_sales_df.sort_values('flavor')
print(sorted_by_flavor); print(' ')

sum_by_flavor = ice_cream_sales_df.groupby('flavor').agg({"units sold":sum})
print(sum_by_flavor); print(' ')

max_by_week = ice_cream_sales_df.groupby('week').agg({"units sold":max})
print(max_by_week);print(' ')
Output: 
   week  units sold      flavor
0     1           6   chocolate
1     1          15       lemon
2     1          12  strawberry
3     1           6     vanilla
4     2          16   chocolate
 
     week  units sold     flavor
0       1           6  chocolate
124    32           2  chocolate
176    45          10  chocolate
36     10           7  chocolate
76     20          11  chocolate
..    ...         ...        ...
95     24           2    vanilla
91     23           5    vanilla
87     22           6    vanilla
79     20           8    vanilla
207    52          18    vanilla

[208 rows x 3 columns]
 
            units sold
flavor                
chocolate          460
lemon              713
strawberry         399
vanilla            527
 
      units sold
week            
1             15
2             18
3              9
4             20
5             20
6             11
7             13
8             18
9             12
10            18
11            18
12            16
13            15
14            14
15            20
16            18
17            16
18            20
19            18
20            13
21            18
22            18
23            15
24            19
25            16
26            11
27            14
28            13
29            18
30            20
31            16
32            17
33            16
34            19
35            18
36            18
37            16
38            11
39            15
40            20
41            14
42            17
43            17
44            20
45            15
46            19
47            18
48            15
49            17
50             9
51            20
52            18

I’m self learning python/pandas. I see that you can group by 2 columns, but whenever I try on kaggle, it gives an error.

max_by_week = ice_cream_sales_df.groupby('week','flavor').agg({"units sold":max})

Error:

ValueError: No axis named flavor for object type DataFrame.

What I’m trying to do is list not only the max for each week, but which flavor HAD the max for that week. How can I do this? If I take out the 2nd column whether its week or flavor the code works. But when I try to group by both, it fails.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Try this,

max_by_week = ice_cream_sales_df.groupby(by=['week','flavor']).agg({"units sold":max})

If you want to group more columns, you should pass the columns to by argument as a list of strings.

From doc

DataFrame.groupby(by=None, axis=0, level=None, as_index=True,
sort=True, group_keys=True, observed=False, dropna=True)

you can see that axis is the second argument, which is the reason for your issue

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading