I am trying to read in a csv with a time series of measured values. I have the problem that the time stamp omits the time at 00:00 at the extraction from the measurement system.
Here is an excerpt from a csv file:
Time NS Status u12 u23 u31 p1 q1
27.12.2023 23:30:00 0 0 20854,6 20482,8 20706,1 7599130 -2050710
27.12.2023 23:40:00 0 0 20882,8 20510,9 20728,6 7494070 -2078320
27.12.2023 23:50:00 0 0 20819,2 20448,5 20674,7 7672400 -1929610
28.12.2023 0 0 20792,9 20413,2 20645,9 7565910 -1942710
28.12.2023 00:10:00 0 0 20768,6 20368,6 20613,4 7174330 -2002890
And I used:
df = pd.read_csv('C:/Python/Input/measured_values.csv',
sep = '\t', decimal=',',
skiprows=1, encoding='unicode_escape',
parse_dates=['Time'], dayfirst=True)
- I have tried different parameters from the parse_dates function to solve the problem
- I also tried to create a new timestamp with incrementing the time between the start and end of the time series – with isn’t simple because i have to pay attention to the time change
is there a simple and efficient, robust solution to read in this sort of csv and declare the column time as datetime64[ns] ?
>Solution :
You could try to pass date_format='mixed', and should probably remove skiprows=1:
df = pd.read_csv(filename, sep='\t', decimal=',', encoding='unicode_escape',
parse_dates=['Time'], dayfirst=True, date_format='mixed')
print(df)
Time NS Status u12 u23 u31 p1 q1
0 2023-12-27 23:30:00 0 0 20854.6 20482.8 20706.1 7599130 -2050710
1 2023-12-27 23:40:00 0 0 20882.8 20510.9 20728.6 7494070 -2078320
2 2023-12-27 23:50:00 0 0 20819.2 20448.5 20674.7 7672400 -1929610
3 2023-12-28 00:00:00 0 0 20792.9 20413.2 20645.9 7565910 -1942710
4 2023-12-28 00:10:00 0 0 20768.6 20368.6 20613.4 7174330 -2002890
print(df.dtypes)
Time datetime64[ns]
NS int64
Status int64
u12 float64
u23 float64
u31 float64
p1 int64
q1 int64
dtype: object