Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Keep consistent dtype and timezone when concatenating with NaT in pandas

I have two pandas DataFrames containing time series that must be concatenated for further processing. One DataFrame contains localized timestamps while the other contains NaT in the time column. When concatenating, the column type changes from datetime64[ns] to object, hindering the further analysis.

My goal: keeping a localized time column, even after concatenation with NaT.

Example code

import pandas as pd

a = pd.DataFrame(
    {
        'DateTime': pd.date_range(
            start='2022-10-10',
            periods=7,
            freq='1D',
            tz='America/New_York'
        ),
        'Value': range(7)
    }
)
b = pd.DataFrame(
    {
        'DateTime': pd.NaT,
        'Value': range(10,20),
    }
)
c = pd.concat([a, b], axis=0, ignore_index=True)

The dtypes of a and b are different:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> print(a.dtypes)
DateTime    datetime64[ns, America/New_York]
Value                                  int64
dtype: object

>>> print(b.dtypes)
DateTime    datetime64[ns]
Value                int64
dtype: object

Since the timestamp for a is localized but the timestamp for b is not, the concatenation results in an object.

>>> print(c.dtypes)
DateTime    object
Value        int64
dtype: object

When trying to localize b, I get a TypeError:

>>> b['DateTime'] = b['DateTime'].tz_localize('America/New_York')
Traceback (most recent call last):
  File "/tmp/so-pandas-nat.py", line 27, in <module>
    b['DateTime'] = b['DateTime'].tz_localize('America/New_York')
  File ".venv/lib/python3.10/site-packages/pandas/core/generic.py", line 9977, in tz_localize
    ax = _tz_localize(ax, tz, ambiguous, nonexistent)
  File ".venv/lib/python3.10/site-packages/pandas/core/generic.py", line 9959, in _tz_localize
    raise TypeError(
TypeError: index is not a valid DatetimeIndex or PeriodIndex

>Solution :

Use Series.dt.tz_localize for processing column, if use Series.tz_localize it want processing DatetimeIndex, here raise error, becuse RangeIndex:

b['DateTime'] = b['DateTime'].dt.tz_localize('America/New_York')
c = pd.concat([a, b], axis=0, ignore_index=True)

print(c.dtypes)
DateTime    datetime64[ns, America/New_York]
Value                                  int64
dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading