I need to import a .txt file with some statistics about weather. The values however are seperated by a comma followed by three spaces. When I try to remove this by adding sep=" " or ", " I get an error.
from tkinter.ttk import Separator
import pandas as pd
# Import dataset
df = pd.read_csv("etmgeg_235.txt")
# Drop eventual null values
df.isnull().sum()
df.dropna
#Show correlations
cr = df.corr()
print(cr)
‘
The program "works" when importing the .txt file, but then I get one correlation with NaN and one with a value of 1.0.
The dataset looks like this: "235,19060101, 113, 67, 67, 87, 12, 51, 1, , , -28, etc…." with a few more whitespaces between them. How do I import this dataset correctly?
>Solution :
Use pd.read_csv with engine='python' to set a regex separator. Something like:
df = pd.read_csv('data.csv', sep=r',\s*', engine='python')