Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas isn't recognising my datetime column

I exported this from a postgres table as a tab-separated csv, like so:

\copy (select * from mytable) to 'labels.csv' csv DELIMITER E'\t' header

Which is (file head)

user_id  session_id   start_time           mode
  2       715      2016-04-01 01:07:49+01   car
  2       716      2016-04-01 03:09:53+01   car
  2      1082      2016-04-02 13:05:16+01   car
  2      1090      2016-04-02 15:16:32+01   car

I read this into pandas and wanted to remove timezone info, this way:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.read_csv('labels.csv', sep='\t',parse_dates=['start_time']) 
df['start_time'] = df['start_time'].dt.tz_localize(None)

But gives the error:

AttributeError: Can only use .dt accessor with datetimelike values

EDIT

df.head() gives:


```user_id  session_id     start_time              mode
0    2         715  2016-04-01 01:07:49+01:00     car
1    2         716  2016-04-01 03:09:53+01:00     car
2    2        1082  2016-04-02 13:05:16+01:00     car
3    2        1090  2016-04-02 15:16:32+01:00     car
4    2        1601  2016-04-04 13:56:13+01:00     foot

However,

df.info()
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   user_id              5374 non-null   int64 
 1   session_id           5374 non-null   int64 
 2   start_time           5374 non-null   object
 3   transportation_mode  5374 non-null   object
dtypes: int64(3), object(2)

>Solution :

See the docs for pd.read_csv:

parse_dates : bool or list of int or names or list of lists or dict, default False

If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pd.to_datetime with utc=True. See Parsing a CSV with mixed timezones for more.

You likely have an unparseable date in your data. Try to coerce to datetime after you read using pandas.to_datetime, to cause an error on the bad value, as this will raise errors on bad values by default:

df["start_time"] = pd.to_datetime(df["start_time"])

Once you identify the issue, you can then handle the value in your code. Something like:

# explicitly handle known invalid values
df["start_time"] = df["start_time"].replace({"--": pd.NaT})
df["start_time"] = pd.to_datetime(df["start_time"])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading