Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

reindexing data to DateTimeIndex results in all price data becoming NaN

gold = pd.read_csv('gold.csv')


gold['Date'] = gold['Date'].apply(pd.to_datetime, utc=True)


gold
                         Date   Price
0   1950-01-01 00:00:00+00:00   34.730
1   1950-02-01 00:00:00+00:00   34.730
2   1950-03-01 00:00:00+00:00   34.730
3   1950-04-01 00:00:00+00:00   34.730
4   1950-05-01 00:00:00+00:00   34.730
... ...                         ...
842 2020-03-01 00:00:00+00:00   1593.764
843 2020-04-01 00:00:00+00:00   1680.030
844 2020-05-01 00:00:00+00:00   1715.697
845 2020-06-01 00:00:00+00:00   1734.032
846 2020-07-01 00:00:00+00:00   1840.807


gold.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 847 entries, 0 to 846
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype              
---  ------  --------------  -----              
 0   Date    847 non-null    datetime64[ns, UTC]
 1   Price   847 non-null    float64            
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 13.4 KB


gold = gold.set_index('Date')


gold.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 847 entries, 1950-01-01 00:00:00+00:00 to 2020-07-01 00:00:00+00:00
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Price   847 non-null    float64
dtypes: float64(1)
memory usage: 13.2 KB


gold        
Date                        Price
1950-01-01 00:00:00+00:00   34.730
1950-02-01 00:00:00+00:00   34.730
1950-03-01 00:00:00+00:00   34.730
1950-04-01 00:00:00+00:00   34.730
1950-05-01 00:00:00+00:00   34.730
...                         ...
2020-03-01 00:00:00+00:00   1593.764
2020-04-01 00:00:00+00:00   1680.030
2020-05-01 00:00:00+00:00   1715.697
2020-06-01 00:00:00+00:00   1734.032
2020-07-01 00:00:00+00:00   1840.807


years = pd.date_range('01/01/1950', '01/01/2020', freq='YS')


years
DatetimeIndex(['1950-01-01', '1951-01-01', '1952-01-01', '1953-01-01',
               '1954-01-01', '1955-01-01', '1956-01-01', '1957-01-01',
               '1958-01-01', '1959-01-01', '1960-01-01', '1961-01-01',
               '1962-01-01', '1963-01-01', '1964-01-01', '1965-01-01',
               '1966-01-01', '1967-01-01', '1968-01-01', '1969-01-01',
               '1970-01-01', '1971-01-01', '1972-01-01', '1973-01-01',
               '1974-01-01', '1975-01-01', '1976-01-01', '1977-01-01',
               '1978-01-01', '1979-01-01', '1980-01-01', '1981-01-01',
               '1982-01-01', '1983-01-01', '1984-01-01', '1985-01-01',
               '1986-01-01', '1987-01-01', '1988-01-01', '1989-01-01',
               '1990-01-01', '1991-01-01', '1992-01-01', '1993-01-01',
               '1994-01-01', '1995-01-01', '1996-01-01', '1997-01-01',
               '1998-01-01', '1999-01-01', '2000-01-01', '2001-01-01',
               '2002-01-01', '2003-01-01', '2004-01-01', '2005-01-01',
               '2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01',
               '2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
               '2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01',
               '2018-01-01', '2019-01-01', '2020-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')


reindexed = gold.reindex(years)


reindexed
            Price
1950-01-01  NaN
1951-01-01  NaN
1952-01-01  NaN
1953-01-01  NaN
1954-01-01  NaN
...         ...
2016-01-01  NaN
2017-01-01  NaN
2018-01-01  NaN
2019-01-01  NaN
2020-01-01  NaN

When I plot reindexed there are no values for the price after reindexing. I am trying to get gold reindexed by the first day of each year from 1950 to 2020, and then plot them on a pandas generated line graph. I have looked at various posts from Stack Overflow with similar issues, but I am still not sure what I am doing wrong.

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You need to use an UTC localized index, convert with tz_localize:

reindexed = gold.reindex(years.tz_localize('utc'))

Output:

                           Price
1950-01-01 00:00:00+00:00  34.73
1951-01-01 00:00:00+00:00    NaN  # NaN as the data is
1952-01-01 00:00:00+00:00    NaN  # not in your example
1953-01-01 00:00:00+00:00    NaN
1954-01-01 00:00:00+00:00    NaN
...                          ...
2016-01-01 00:00:00+00:00    NaN
2017-01-01 00:00:00+00:00    NaN
2018-01-01 00:00:00+00:00    NaN
2019-01-01 00:00:00+00:00    NaN
2020-01-01 00:00:00+00:00    NaN

[71 rows x 1 columns]

You might also try to resample getting the first value per year:

resampled = gold.resample('YS').first()

Output:

                              Price
Date                               
1950-01-01 00:00:00+00:00    34.730
1951-01-01 00:00:00+00:00       NaN
1952-01-01 00:00:00+00:00       NaN
1953-01-01 00:00:00+00:00       NaN
1954-01-01 00:00:00+00:00       NaN
...                             ...
2016-01-01 00:00:00+00:00       NaN
2017-01-01 00:00:00+00:00       NaN
2018-01-01 00:00:00+00:00       NaN
2019-01-01 00:00:00+00:00       NaN
2020-01-01 00:00:00+00:00  1593.764

[71 rows x 1 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading