Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate Month to Date (MTD) and YTD as cumulative average using Pandas dataframe?

I want to calculative MTD and YTD as cumulative average using pandas dataframe.I wrote below code to generate the output.

code:

import pandas as pd

#MTD AND YTD calculation

data = {'date' : ['2016/11/01', '2016/11/02', '2016/11/03', '2016/11/04', '2016/11/05', '2016/11/05', '2016/12/01', '2016/12/02', '2016/12/03', '2016/12/04', '2016/12/15', '2016/12/20', '2016/12/23', '2016/12/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
        'category': ['fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit'],

        'product': ['grapes', 'grapes', 'grapes', 'kiwi', 'kiwi', 'grapes', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
        'price': [10, 10, 20, 40, 60, 30, 10, 20, 10, 50, 10, 5, 10, 10, 10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}


df = pd.DataFrame(data)

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean()

print(df)

But I got a error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error:

Traceback (most recent call last):
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 11000, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/series.py", line 4672, in reindex
    return super().reindex(**kwargs)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/generic.py", line 4981, in _reindex_axes
    new_index, indexer = ax.reindex(
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4223, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ab/parry-data_processing/pa/poc.py", line 21, in <module>
    df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 4532, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 11007, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

Can anyone suggest a solution to find the cumulative average for MTD and YTD?

>Solution :

Use Series.droplevel for remove first 3 levels in MultiIndex:

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean().droplevel([0,1,2])
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean().droplevel([0,1,2])

print(df)

         date category product  price        MTD        YTD
0  2016-11-01    fruit  grapes     10  10.000000  10.000000
1  2016-11-02    fruit  grapes     10  10.000000  10.000000
2  2016-11-03    fruit  grapes     20  13.333333  13.333333
3  2016-11-04    fruit    kiwi     40  40.000000  40.000000
4  2016-11-05    fruit    kiwi     60  50.000000  50.000000
..        ...      ...     ...    ...        ...        ...
57 2017-05-04    fruit  Orange     10  20.000000  18.333333
58 2017-05-15    fruit  Orange     20  20.000000  18.461538
59 2017-05-20    fruit  Orange     20  20.000000  18.571429
60 2017-05-23    fruit  Orange     10  18.571429  18.000000
61 2017-05-30    fruit  Orange      5  16.875000  17.187500

[62 rows x 6 columns]

If need dynamic solution for remove all levels without last is possible use MultiIndex.nlevels for gen number of levels and subtract 1 for keep last level:

s1 = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
s2 = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean()
df['MTD'] = s1.droplevel(list(range(s1.index.nlevels - 1)))
df['YTD'] = s2.droplevel(list(range(s1.index.nlevels - 1)))

print(df)

         date category product  price        MTD        YTD
0  2016-11-01    fruit  grapes     10  10.000000  10.000000
1  2016-11-02    fruit  grapes     10  10.000000  10.000000
2  2016-11-03    fruit  grapes     20  13.333333  13.333333
3  2016-11-04    fruit    kiwi     40  40.000000  40.000000
4  2016-11-05    fruit    kiwi     60  50.000000  50.000000
..        ...      ...     ...    ...        ...        ...
57 2017-05-04    fruit  Orange     10  20.000000  18.333333
58 2017-05-15    fruit  Orange     20  20.000000  18.461538
59 2017-05-20    fruit  Orange     20  20.000000  18.571429
60 2017-05-23    fruit  Orange     10  18.571429  18.000000
61 2017-05-30    fruit  Orange      5  16.875000  17.187500

[62 rows x 6 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading