Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

when using group_by: TypeError: incompatible index of inserted column with frame index

I have a df that I’ve read from sql:

      id  stock_id symbol        date   open   high      low  close  volume
0      1        35   ABSI  2022-09-28   3.06   3.33   3.0400   3.27  217040
1      2        35   ABSI  2022-09-29   3.19   3.19   3.0300   3.12  187309
2      3        35   ABSI  2022-09-30   3.11   3.27   3.0700   3.13  196566
3      4        35   ABSI  2022-10-03   3.16   3.16   2.8600   2.97  310441
4      5        35   ABSI  2022-10-04   3.04   3.37   2.9600   3.27  361082
..   ...       ...    ...         ...    ...    ...      ...    ...     ...
383  384        16    VVI  2022-10-03  31.93  33.85  31.3050  33.60  151357
384  385        16    VVI  2022-10-04  34.41  35.46  34.1900  35.39  105773
385  386        16    VVI  2022-10-05  34.67  35.30  34.5000  34.86   59605
386  387        16    VVI  2022-10-06  34.80  35.14  34.3850  34.50   55323
387  388        16    VVI  2022-10-07  33.99  33.99  33.3409  33.70   45187

[388 rows x 9 columns]

I then try and get the average of the last 5 days and add it to a new column:

df['volume_5_day'] = df.groupby('stock_id')['volume'].rolling(5).mean()

Which gives me the following error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Traceback (most recent call last):
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 11003, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/series.py", line 4672, in reindex
    return super().reindex(**kwargs)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 4981, in _reindex_axes
    new_index, indexer = ax.reindex(
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4237, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dan/Documents/code/wolfhound/add_indicators_daily.py", line 10, in <module>
    df['volume_10_day'] = df.groupby('stock_id')['volume'].rolling(1).mean()
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 4535, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

Any ideas what’s going wrong here? Previously this worked and now it’s throwing an error – and I can’t seem to figure out why

>Solution :

Chain Series.to_numpy to add the values as a np.array and make sure to add sort=False inside df.groupby:

df['volume_5_day'] = df.groupby('stock_id', sort=False)['volume']\
    .rolling(5).mean().to_numpy()

print(df)

      id  stock_id symbol        date  ...      low  close  volume  volume_5_day
0      1        35   ABSI  2022-09-28  ...   3.0400   3.27  217040           NaN
1      2        35   ABSI  2022-09-29  ...   3.0300   3.12  187309           NaN
2      3        35   ABSI  2022-09-30  ...   3.0700   3.13  196566           NaN
3      4        35   ABSI  2022-10-03  ...   2.8600   2.97  310441           NaN
4      5        35   ABSI  2022-10-04  ...   2.9600   3.27  361082      254487.6
383  384        16    VVI  2022-10-03  ...  31.3050  33.60  151357           NaN
384  385        16    VVI  2022-10-04  ...  34.1900  35.39  105773           NaN
385  386        16    VVI  2022-10-05  ...  34.5000  34.86   59605           NaN
386  387        16    VVI  2022-10-06  ...  34.3850  34.50   55323           NaN
387  388        16    VVI  2022-10-07  ...  33.3409  33.70   45187       83449.0

Your initial approach fails, because the df.groupby method that you are using, returns a pd.Series with a different index than your df. E.g.:

print(df.groupby('stock_id')['volume'].rolling(5).mean().index)
MultiIndex([(16, 383),
            (16, 384),
            (16, 385),
            (16, 386),
            (16, 387),
            (35,   0),
            (35,   1),
            (35,   2),
            (35,   3),
            (35,   4)],
           names=['stock_id', None])

So, it is saying it is unable to map this onto:

print(df.index)
Int64Index([0, 1, 2, 3, 4, 383, 384, 385, 386, 387], dtype='int64')

With a np.array you don’t have this problem. You could also have used:

df['volume_5_day'] = df.groupby('stock_id', as_index=False)['volume']\
    .rolling(5).mean()['volume']

In this case, you don’t need to add sort=False, as it will match correctly on the index values.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading