KeyError when using set_index on Dataframe created from read_sql_query

With the following code, I can see the following table output

query = f''' SELECT timestamp, base FROM pricing WHERE {timestamp_range} LIMIT {max_records}'''
df: DataFrame = read_sql_query(text(query), db)
print(df.head())
print(df.columns)
df.set_index('timestamp', inplace=True)  # Error here


#  Output
#                    timestamp  base
# 0 2023-02-17 10:25:54.099542    21
# 1 2023-02-17 10:27:54.060627    21
# 2 2023-02-17 10:29:53.581384    22
# 3 2023-02-17 10:31:54.110646    20
# 4 2023-02-17 10:33:53.827830    20
# Index(['timestamp', 'base'], dtype='object')

So it looks like I do have the timestamp column, but when using set_index() I get a KeyError: 'timestamp'. Why is this? Using df.columns[0] didn’t help either.


For reference, full stack trace log

[2023-02-17 13:50:32,388] ERROR in app: Exception on /api/data/query [GET]
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'timestamp'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/ubuntu/Code/grozd/main.py", line 18, in query
    result = getQuery(request.args)
  File "/home/ubuntu/Code/grozd/data.py", line 27, in getQuery
    df['timestamp'] = df['timestamp'].astype(int).floordiv(1000000).astype(int)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'timestamp'

>Solution :

From the traceback it looks like the error does not actually occur in the line you highlighted, but later on when you do this:

df['timestamp'] = df['timestamp'].astype(int).floordiv(1000000).astype(int)

The reason this fails is that once you’ve assigned the timestamp column to the index, it is no longer found among the columns. To fix this you can either perform your transformation first and then set the index, or transform the index directly:

df.index = df.index.astype(int).floordiv(1000000).astype(int)

Leave a Reply