I have the following dataset :
df=pd.read_csv('https://raw.githubusercontent.com/michalis0/DataMining_and_MachineLearning/master/data/sales.csv')
I want to visualize the average sales by types of Customer (Segment
).
I calculated the average sales by segment with this :
average_sales = df.groupby(['Segment','OrderYear'],as_index=True)['Sales'].agg({"mean"})
print(average_sales)
And I get the following output:
mean
Segment OrderYear
Consumer 2015 251.633302
2016 238.200804
2017 223.269145
2018 200.469005
Corporate 2015 212.641424
2016 189.902305
2017 263.348456
2018 243.634951
Home Office 2015 290.234240
2016 222.101830
2017 226.382196
2018 242.532951
Now I want to plot it in a line chart with the year on the x_axis and average sales on y_axis but everytime I try I only get one line for mean
while I want one line for each Segment
.
One line for ‘Consumer’, one for ‘Corporate’, one for ‘Home Office’.
I think it might be because Segment is an index rather than a column but I still can’t plot by segment.
>Solution :
- It is better to shape the date with
.pivot_table
, which results in the dataframe being in the correct shape to plot.- Using the existing groupby to plot, requires unstacking:
df.groupby(['Segment','OrderYear'], as_index=True)['Sales'].agg({"mean"}).unstack(level=0).plot()
- Using the existing groupby to plot, requires unstacking:
- Plot the dataframe,
pt
, directly withpandas.DataFrame.plot
- This type of aggregated data should be presented in grouped bars, but a line plot has also been provided.
- Line plot is more difficult to read with many segments
- The slope implies information that may not be strictly correct
- See How to plot and annotate grouped bars seaborn / matplotlib or How to plot and annotate a grouped bar chart if you’re interested in adding annotations.
- Tested in
python 3.8.12
,pandas 1.3.4
,matplotlib 3.4.3
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/michalis0/DataMining_and_MachineLearning/master/data/sales.csv')
# convert to datetime
df['Order Date'] = pd.to_datetime(df['Order Date'])
# extract year
df['OrderYear'] = df['Order Date'].dt.year
# pivot table
pt = df.pivot_table(index='OrderYear', columns='Segment', values='Sales', aggfunc='mean')
# display(pt)
Segment Consumer Corporate Home Office
OrderYear
2015 251.633302 212.641424 290.234240
2016 238.629760 196.834867 222.101830
2017 223.269145 264.486862 228.730257
2018 200.368580 243.595111 242.532951
# plot bar
ax1 = pt.plot(kind='bar', rot=0)
# or plot line
ax2 = pt.plot(xticks=pt.index)
Sample Data
- In case the github repo is no longer available
Order Date,Segment,Sales
08/04/2018,Home Office,195.76
23/05/2015,Consumer,17.96
01/12/2018,Consumer,406.368
26/03/2016,Consumer,40.032
04/11/2015,Corporate,8.376
14/11/2017,Consumer,48.576
08/05/2017,Consumer,10.368
25/11/2015,Consumer,539.92
17/06/2017,Consumer,319.41
14/01/2015,Corporate,61.96
01/12/2018,Consumer,105.584
24/11/2016,Consumer,91.392
12/07/2015,Consumer,123.136
26/01/2018,Consumer,11.84
01/03/2015,Consumer,137.352
06/12/2016,Consumer,19.92
07/04/2017,Consumer,3.64
17/09/2016,Consumer,4228.704
29/09/2018,Home Office,7.968
17/08/2018,Corporate,2518.29
04/11/2015,Home Office,275.94
20/06/2017,Consumer,17.712
09/09/2018,Consumer,6.56
03/03/2017,Corporate,563.43
11/10/2017,Consumer,27.72
08/12/2018,Consumer,19.44
01/06/2015,Home Office,47.88
28/10/2017,Corporate,756.8
31/07/2016,Consumer,2309.65
08/11/2016,Corporate,4.712
20/10/2016,Corporate,16.02
23/12/2018,Corporate,367.96
15/02/2016,Corporate,134.97
29/12/2015,Consumer,23.976
05/10/2018,Home Office,39.92
25/06/2016,Home Office,31.104
28/10/2017,Consumer,47.952
25/09/2015,Home Office,3.264
18/12/2016,Corporate,18.432
07/09/2018,Consumer,25.16
26/06/2017,Home Office,8.02
16/06/2018,Consumer,18.54
06/12/2016,Consumer,198.272
04/05/2018,Corporate,9.396
23/10/2018,Consumer,10.272
21/02/2017,Corporate,39.98
22/07/2015,Home Office,19.68
29/09/2018,Home Office,27.968
03/08/2015,Consumer,218.75
07/10/2018,Home Office,18.936
18/04/2016,Consumer,115.44
04/04/2016,Consumer,644.076
03/07/2018,Home Office,24.56
10/11/2016,Consumer,577.584
12/05/2018,Consumer,87.4
21/02/2017,Home Office,3.762
18/08/2018,Consumer,21.38
13/07/2016,Consumer,11.808
17/12/2018,Consumer,66.284
02/12/2015,Corporate,58.36
01/12/2015,Consumer,45.84
23/05/2016,Home Office,850.5
14/10/2015,Corporate,22.92
23/10/2018,Corporate,11.56
20/07/2015,Corporate,41.94
16/06/2016,Consumer,133.98
02/09/2015,Consumer,21.24
11/11/2017,Corporate,95.968
03/10/2015,Home Office,6.192
19/11/2018,Consumer,25.06
25/08/2015,Consumer,40.096
29/12/2018,Consumer,34.58
05/12/2018,Consumer,11.07
23/07/2017,Consumer,4.448
05/03/2016,Consumer,11.212
09/06/2015,Consumer,911.424
21/11/2016,Consumer,10.92
13/02/2018,Consumer,10.71
27/04/2016,Consumer,1379.92
30/10/2018,Home Office,33.94
08/08/2016,Consumer,447.86
07/12/2016,Consumer,79.92
21/08/2018,Corporate,33.18
26/01/2015,Home Office,19.44
09/06/2015,Consumer,1706.184
26/09/2016,Consumer,79.056
05/04/2016,Home Office,547.136
27/10/2018,Corporate,5.607
03/07/2016,Consumer,294.93
16/11/2015,Home Office,169.45
08/12/2015,Corporate,60.416
23/11/2016,Consumer,16.56
05/10/2018,Home Office,75.792
19/03/2016,Consumer,17.568
21/08/2017,Corporate,2887.056
25/04/2016,Corporate,21.34
14/05/2017,Corporate,4.768
03/11/2016,Home Office,42.6
21/10/2017,Consumer,22.92
10/07/2018,Corporate,41.91
16/11/2018,Consumer,811.28
17/09/2018,Corporate,10.776
01/12/2018,Home Office,62.958
07/12/2018,Consumer,374.376
19/11/2018,Consumer,821.88
16/06/2018,Consumer,23.92
19/05/2017,Consumer,242.9
06/06/2017,Corporate,105.52
05/12/2015,Corporate,29.94
12/08/2018,Consumer,299.99
08/04/2018,Home Office,41.95
04/10/2015,Consumer,95.648
25/11/2017,Consumer,194.352
18/09/2016,Corporate,11.68
20/12/2016,Home Office,11.696
24/04/2017,Consumer,3.984
14/05/2015,Corporate,310.88
22/09/2015,Consumer,579.528
02/05/2015,Consumer,26.136
19/08/2015,Corporate,69.456
08/07/2018,Corporate,28.91
26/11/2015,Corporate,7.312
24/06/2018,Consumer,21.744
12/11/2018,Consumer,221.024
27/08/2016,Consumer,3.08
18/11/2018,Consumer,127.386
21/11/2016,Corporate,246.1328
12/05/2017,Consumer,120.0
30/12/2017,Home Office,481.32
20/07/2016,Consumer,913.43
23/11/2018,Corporate,10.688
23/04/2015,Home Office,22.336
17/09/2016,Consumer,3.264
20/10/2016,Consumer,24.56
04/06/2017,Consumer,14.94
19/11/2016,Consumer,5.984
30/07/2016,Consumer,209.93
17/09/2016,Consumer,110.96
12/10/2016,Consumer,263.96
02/09/2017,Consumer,65.94
12/10/2016,Consumer,81.96
14/05/2016,Home Office,198.272
09/12/2018,Corporate,37.208
23/05/2017,Consumer,122.382
23/09/2018,Consumer,199.95
28/12/2015,Corporate,704.25
19/01/2018,Consumer,6.0
12/10/2016,Home Office,19.9
14/11/2016,Corporate,37.0
03/10/2018,Home Office,6.63
20/07/2015,Consumer,104.85
10/09/2015,Consumer,1487.04
12/10/2018,Corporate,39.984
23/12/2015,Corporate,56.52
17/11/2016,Consumer,106.32
18/03/2015,Home Office,1856.19
01/09/2016,Home Office,1088.76
05/07/2016,Home Office,19.0
03/11/2015,Home Office,6.72
28/05/2017,Consumer,22.72
13/06/2018,Home Office,164.736
26/09/2016,Consumer,239.8
12/10/2018,Consumer,17.9
02/10/2018,Corporate,21.984
12/11/2018,Home Office,23.12
21/01/2018,Home Office,242.94
09/08/2015,Consumer,2060.744
25/04/2016,Consumer,128.058
04/03/2018,Corporate,15.25
04/08/2017,Home Office,35.06
18/12/2016,Corporate,55.936
19/12/2016,Consumer,675.96
12/07/2016,Consumer,659.168
06/04/2015,Corporate,70.95
19/05/2018,Home Office,681.408
09/07/2016,Consumer,153.36
21/08/2016,Home Office,4.28
22/05/2018,Consumer,22.344
26/08/2015,Consumer,17.34
19/09/2016,Corporate,66.36
06/11/2018,Home Office,449.568
21/11/2017,Consumer,21.568
24/12/2017,Home Office,27.882
09/07/2015,Home Office,23.92
05/08/2016,Corporate,33.488
20/11/2017,Consumer,2.628
07/03/2015,Corporate,481.568
25/11/2017,Consumer,59.98
14/07/2018,Consumer,276.69
03/10/2015,Consumer,14.48
28/07/2017,Home Office,302.72
05/09/2017,Corporate,43.6
16/03/2016,Home Office,17.52
02/09/2017,Home Office,84.272
22/06/2015,Consumer,170.058
08/07/2018,Home Office,86.376
01/11/2016,Home Office,3.168
04/11/2017,Consumer,11.376
18/12/2018,Consumer,46.672
05/12/2017,Consumer,465.18