Index mistake when using pandas read_csv

November 20, 2021

Whilst trying to import the popular UCL bank marketing dataset from a github endpoint I ran into some issues. The read statement is not taking the dataset of the 17 columns correctly. I checked the separator and the header but I am not sure how to correct the index.

# URL endoint
url = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'

column_names = ["age","job","marital","education","default","balance","housing","loan","contact","day","month"
 ,"duration","campaign","pdays","previous","poutcome", "y"]


raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?',sep=';'
                          , skipinitialspace=False, index_col=None)

Instead, it is giving me something like this:

How can I import the dataset (link) correctly from the URL using pandas read_csv ?

>Solution :

You don’t need to set the headers. It already comes with the headers in the CSV. The reason yours looks weird, is because you are missing 3 values in your headers list, which is why it’s offset by 3.