I have timestamp in my data of this format 2021-12-01 19:00:00+00:00 ,
I am applying isolation forest to label the data and i tried this following code but got error ValueError: could not convert string to float: ‘2018-12-01 17:00:00+00:00’
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.ensemble import IsolationForest
import matplotlib.pyplot as plt
import scipy.stats as stats
df = pd.read_csv('C:/Users/Desktop/labeling/fCCC.csv')
#df = df.fillna(df.median())
model=IsolationForest(n_estimators=50, max_samples='auto', contamination=float(0.1),max_features=1.0)
model.fit(df[['timestamp','A','B','C','D']])
df['scores']=model.decision_function(df[['timestamp','A','B','C','D']])
df['anomaly']=model.predict(df[['timestamp','A','B','C','D']])
df.to_csv('C:/Users/Desktop/labeling/anoscore.csv', index=False, header=True)
anomaly=df.loc[df['anomaly']==-1]
anomaly_index=list(anomaly.index)
print(anomaly_index)
anomaly_index.sort()
print(anomaly_index)
df = pd.DataFrame(anomaly_index)
>Solution :
Adding parse_dates=[<columns>] to pd.read_csv will cause Pandas to automatically convert strings that look like dates to actual datetime objects:
df = pd.read_csv('C:/Users/Desktop/labeling/fCCC.csv', parse_dates=['timestamp'])
df['timestamp'] = df['timestamp'].astype('int')
