Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Grouper name `datetime` not found after groupby

I have this Pandas dataframe

        datetime       machineID    errorID
0   2015-01-06 03:00:00     1   error3
1   2015-02-03 06:00:00     1   error4
2   2015-02-21 11:00:00     1   error1
3   2015-02-21 16:00:00     1   error2
4   2015-03-20 06:00:00     1   error1
5   2015-04-04 06:00:00     1   error5
6   2015-05-04 06:00:00     1   error4
7   2015-05-19 06:00:00     1   error2
8   2015-05-19 06:00:00     1   error3
9   2015-06-03 06:00:00     1   error5

Now I want to unstack the errorID so that I can get columns based on error1, error2error5. So for this I have used groupby and unstack method in Pandas

a = errors.groupby(['machineID', 'datetime', 'errorID']).size().unstack('errorID', fill_value=0)

which gives me this dataframe

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

             errorID    error1  error2  error3  error4  error5
machineID   datetime                    
1   2015-01-06 03:00:00     0   0   1   0   0
    2015-02-03 06:00:00     0   0   0   1   0
    2015-02-21 11:00:00     1   0   0   0   0
    2015-02-21 16:00:00     0   1   0   0   0
    2015-03-20 06:00:00     1   0   0   0   0

Now I want to resample this data based on 24H frequency and on datetime. But when I use the resample function, it is giving me the error KeyError: 'The grouper name datetime is not found'

a.resample('24H', on='datetime').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})

When I listed all the columns in this dataframe, it shows only ['error1', 'error2', 'error3', 'error4', 'error5']

This is the entire error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-127-607c418305a0> in <module>
----> 1 a.resample('24H', on='datetime').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/generic.py in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base, on, level)
   8447             base=base,
   8448             key=on,
-> 8449             level=level,
   8450         )
   8451         return _maybe_process_deprecations(

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/resample.py in resample(obj, kind, **kwds)
   1304     """
   1305     tg = TimeGrouper(**kwds)
-> 1306     return tg._get_resampler(obj, kind=kind)
   1307 
   1308 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/resample.py in _get_resampler(self, obj, kind)
   1428 
   1429         """
-> 1430         self._set_grouper(obj)
   1431 
   1432         ax = self.ax

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/groupby/grouper.py in _set_grouper(self, obj, sort)
    171             else:
    172                 if key not in obj._info_axis:
--> 173                     raise KeyError("The grouper name {0} is not found".format(key))
    174                 ax = Index(obj[key], name=key)
    175 

KeyError: 'The grouper name datetime is not found'

I don’t how do I use resample after groupby

>Solution :

First convert values to datetimes:

errors['datetime'] = pd.to_datetime(errors['datetime'])

a = errors.groupby(['machineID', 'datetime', 'errorID']).size().unstack('errorID', fill_value=0)

Then if need resample per machineID use:

a = a.reset_index(level=0).groupby('machineID').resample('24H').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})

Or if need only resample use:

a = a.reset_index(level=0).resample('24H').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})

Or if need groupby with Grouper use:

a = a.groupby(['machineID', pd.Grouper(freq='24H', level='datetime')]).agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading