Whilst trying to import the popular UCL bank marketing dataset from a github endpoint I ran into some issues. The read statement is not taking the dataset of the 17 columns correctly. I checked the separator and the header but I am not sure how to correct the index.
# URL endoint
url = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'
column_names = ["age","job","marital","education","default","balance","housing","loan","contact","day","month"
,"duration","campaign","pdays","previous","poutcome", "y"]
raw_dataset = pd.read_csv(url, names=column_names,
na_values='?',sep=';'
, skipinitialspace=False, index_col=None)
Instead, it is giving me something like this:
How can I import the dataset (link) correctly from the URL using pandas read_csv ?
>Solution :
You don’t need to set the headers. It already comes with the headers in the CSV. The reason yours looks weird, is because you are missing 3 values in your headers list, which is why it’s offset by 3.
