Python Pandas – Timeseries – Timestamp missing time exact at 00:00

March 27, 2024

I am trying to read in a csv with a time series of measured values. I have the problem that the time stamp omits the time at 00:00 at the extraction from the measurement system.

Here is an excerpt from a csv file:

Time    NS          Status  u12 u23 u31 p1  q1
27.12.2023 23:30:00 0   0   20854,6 20482,8 20706,1 7599130 -2050710
27.12.2023 23:40:00 0   0   20882,8 20510,9 20728,6 7494070 -2078320
27.12.2023 23:50:00 0   0   20819,2 20448,5 20674,7 7672400 -1929610
28.12.2023          0   0   20792,9 20413,2 20645,9 7565910 -1942710
28.12.2023 00:10:00 0   0   20768,6 20368,6 20613,4 7174330 -2002890

And I used:

df = pd.read_csv('C:/Python/Input/measured_values.csv',
                 sep = '\t', decimal=',',
                 skiprows=1, encoding='unicode_escape',
                 parse_dates=['Time'], dayfirst=True)

I have tried different parameters from the parse_dates function to solve the problem
I also tried to create a new timestamp with incrementing the time between the start and end of the time series – with isn’t simple because i have to pay attention to the time change

is there a simple and efficient, robust solution to read in this sort of csv and declare the column time as datetime64[ns] ?

>Solution :

You could try to pass date_format='mixed', and should probably remove skiprows=1:

df = pd.read_csv(filename, sep='\t', decimal=',', encoding='unicode_escape',
                 parse_dates=['Time'], dayfirst=True, date_format='mixed')

print(df)
                 Time  NS          Status      u12      u23      u31       p1       q1
0 2023-12-27 23:30:00   0               0  20854.6  20482.8  20706.1  7599130 -2050710
1 2023-12-27 23:40:00   0               0  20882.8  20510.9  20728.6  7494070 -2078320
2 2023-12-27 23:50:00   0               0  20819.2  20448.5  20674.7  7672400 -1929610
3 2023-12-28 00:00:00   0               0  20792.9  20413.2  20645.9  7565910 -1942710
4 2023-12-28 00:10:00   0               0  20768.6  20368.6  20613.4  7174330 -2002890

print(df.dtypes)
Time              datetime64[ns]
NS                         int64
        Status             int64
u12                      float64
u23                      float64
u31                      float64
p1                         int64
q1                         int64
dtype: object