Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I index a Date column that doesn't reference itself by index value or column name?

I’m trying to index date values from the date column in a panda dataframe.

                                 Open        High         Low       Close   Volume  Dividends  Stock Splits
Date
2013-01-07 00:00:00-05:00   71.951356   72.440875   71.877924   72.285858  1859400        0.0           0.0
2013-01-08 00:00:00-05:00   72.090030   72.514278   71.151782   71.535240  2173700        0.0           0.0
2013-01-09 00:00:00-05:00   71.926850   72.848777   71.779993   72.661125  2569200        0.0           0.0
2013-01-10 00:00:00-05:00   72.971141   73.370914   72.448986   73.354599  2364800        0.0           0.0
2013-01-11 00:00:00-05:00   73.313856   73.950232   73.036458   73.118050  3284900        0.0           0.0
...                               ...         ...         ...         ...      ...        ...           ...
2022-12-30 00:00:00-05:00  426.880005  429.739990  424.820007  428.760010   759900        0.0           0.0
2023-01-03 00:00:00-05:00  428.000000  428.160004  420.200012  424.290009  1166300        0.0           0.0
2023-01-04 00:00:00-05:00  423.929993  426.359985  420.209991  423.480011  1050800        0.0           0.0
2023-01-05 00:00:00-05:00  422.570007  423.489990  412.609985  414.730011  1485000        0.0           0.0
2023-01-06 00:00:00-05:00  416.200012  428.540009  416.010010  426.470001  1042200        0.0           0.0

When I look up the column names, the first column ‘Date’ doesn’t exist, so am unable to reference it when setting an index.

>>> for col in stock_df.columns:
...     print(col)
...
Open
High
Low
Close
Volume
Dividends
Stock Splits

When I reference index[0] in my data frame, it shows the first column being ‘Open’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> stock_current_price = stock_df["Open"].loc[stock_df.index[0]]
>>> print(stock_current_price)
71.95135572116979

If I reference iloc[:, 0] to my dataframe (or any integer), I get the ‘Open’ column value with the date included. But I can’t separate the date column to be used as an index.

>>> test = stock_df.iloc[:, 0]
>>> print(test)
Date
2013-01-07 00:00:00-05:00     71.951356
2013-01-08 00:00:00-05:00     72.090030
2013-01-09 00:00:00-05:00     71.926850
2013-01-10 00:00:00-05:00     72.971141
2013-01-11 00:00:00-05:00     73.313856
                            ...
2022-12-30 00:00:00-05:00    426.880005
2023-01-03 00:00:00-05:00    428.000000
2023-01-04 00:00:00-05:00    423.929993
2023-01-05 00:00:00-05:00    422.570007
2023-01-06 00:00:00-05:00    416.200012

This includes any index integer referenced, so other column iloc’s will also have the date shown.

>>> test = stock_df.iloc[:, 4]
>>> print(test)
Date
2013-01-07 00:00:00-05:00    1859400
2013-01-08 00:00:00-05:00    2173700
2013-01-09 00:00:00-05:00    2569200
2013-01-10 00:00:00-05:00    2364800
2013-01-11 00:00:00-05:00    3284900
                          ...
2022-12-30 00:00:00-05:00     759900
2023-01-03 00:00:00-05:00    1166300
2023-01-04 00:00:00-05:00    1050800
2023-01-05 00:00:00-05:00    1485000
2023-01-06 00:00:00-05:00    1042200

I would like to take a large dataframe and specify a column value (Open) for example based on a specific "%Y-%m-%d" date. I’m worn out from coding tonight and hoping that a second set of eyes can take a look and help point me in the right direction.

>Solution :

A DataFrame is surrounded by an index and columns; that’s its primary difference from, say, a NumPy array. The "Date" values are in the index part, not in its own column, so, e.g., df["Date"] will fail and df.iloc[:, N] wouldn’t find it for any N; because it’s df.index. If you must, you can move it to become a column with df.reset_index().

However! Index is a crucial part of a DataFrame (or a Series). Almost all operations care about it and you should too 🙂 Arguably, the "Date" is better kept as an index in your case, because

specify a column value (Open) for example based on a specific "%Y-%m-%d" date

is conveniently doable as it stands with

df.loc["1997-04-27", "Open"]

as opposed to

df.loc[df.Date.eq("1997-04-27"), "Open"]

if it ("Date") was a column instead. And it turns out it only gets worse if you want to perform more datetime-specific operations, e.g., resampling, subsetting etc.

(Noting that in df.loc, part before "," addresses index, and after "," addresses columns.)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading