Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Update pandas dataframe based on slice?

I have seen Update pandas dataframe based on slice – but I couldn’t quite find an answer for my use case.

Consider this code, where I have a starting table with "channel" and "value" columns:

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA = StringIO("""channel,value
A,10
A,11
A,12
A,13
B,20
B,22
B,24
B,26
B,28
C,100
C,105
C,110
C,115
C,120
C,125
C,130
""")

mychans = ["A", "B", "C"]

df = pd.read_csv(TESTDATA)
df.insert (2, "value_rel", df["value"] - df["value"][0])

print("Starting:")
print(df.head())

for tchan in mychans:
  this_ch_data = df[df["channel"]==tchan]
  df.loc[this_ch_data.index, "value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]

In the end, I want to obtain the same table with an additional "value_rel" column, which would show the values relative to the first value in that channel (slice); that is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A, 10, 0
A, 11, 1
A, 12, 2
A, 13, 3
B, 20, 0
B, 22, 2
B, 24, 4
B, 26, 6
B, 28, 8
C,100, 0
C,105, 5
...

And if I just use this_ch_data["value_rel"] = this_ch_data["value"] - this_ch_data["value"][0] within the for loop, I get "A value is trying to be set on a copy of a slice from a DataFrame", which makes sense.

However, when the run the code, I get:

$ python3 test1.py
Starting:
  channel  value  value_rel
0       A     10          0
1       A     11          1
2       A     12          2
3       A     13          3
4       B     20         10
Traceback (most recent call last):
  File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\msys64\tmp\test1.py", line 38, in <module>
    df.loc[this_ch_data.index, "value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
  File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
    return self._get_value(key)
  File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
    loc = self.index.get_loc(label)
  File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 0

So, how can I update this DataFrame, based on calculations done on a (copied) slice of the same DataFrame?

>Solution :

You need to use iloc, because the index number 0 does not exist for all the tchan.

for tchan in mychans:
  this_ch_data = df[df["channel"]==tchan]
  df.loc[this_ch_data.index, "value_rel"] = \
         this_ch_data["value"] - this_ch_data["value"].iloc[0]

that said, it is good use case for groupby.transform with first. so no loop required, you can do

df['value_rel'] = df['value'] - df.groupby('channel')['value'].transform('first')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading