Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to count the total sales by year, month

I have a big csv (17985 rows) with sales in different days.The csv looks like this:

Customer   Date        Sale
Larry      1/2/2018    20$
Mike       4/3/2020    40$
John       12/5/2017   10$
Sara       3/2/2020    90$
Charles    9/8/2022    75$

Below is how many times that exact day appears in my csv (how many sales were made that day):

occur = df.groupby(['Date']).size()
occur
2018-01-02     32
2018-01-03     31
2018-01-04     42
2018-01-05    192
2018-01-06     26

I used crosstab, groupby and several methods but the problem is that they don’t add up, or is NaN.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_df['total_sales_that_month'] = df.groupby('Date')['Sale'].sum()
0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
         ..
17980   NaN
17981   NaN
17982   NaN
17983   NaN
17984   NaN

I want to group them by year and month in a dataframe, based on total sales. Using dt.year and dt.month I managed to do this:

    year
month   
1   2020
1   2020
7   2019
8   2019
2   2018
... ...
4   2020
4   2020
4   2020
4   2020
4   2020

What I want to have is: month/year/total_sales_that_month. What method should I apply? This is the expected output:

Month   Year    Total_sale_that_month
1       2018      420$
2       2018      521$
3       2018      124$
4       2018      412$
5       2018      745$

>Solution :

You can use groupby_sum but before you have to strip ‘$’ from Sale column and convert as numeric:

# Clean your dataframe first
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Sale'] = df['Sale'].str.strip('$').astype(float)

out = (df.groupby([df['Date'].dt.month.rename('Month'), 
                   df['Date'].dt.year.rename('Year')])
          ['Sale'].sum()
         .rename('Total_sale_that_month')
         # .astype(str).add('$')  # uncomment if '$' matters
         .reset_index())

Output:

>>> out
   Month  Year  Total_sale_that_month
0      2  2018                   20.0
1      2  2020                   90.0
2      3  2020                   40.0
3      5  2017                   10.0
4      8  2022                   75.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading