Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Index mistake when using pandas read_csv

Whilst trying to import the popular UCL bank marketing dataset from a github endpoint I ran into some issues. The read statement is not taking the dataset of the 17 columns correctly. I checked the separator and the header but I am not sure how to correct the index.

# URL endoint
url = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'

column_names = ["age","job","marital","education","default","balance","housing","loan","contact","day","month"
 ,"duration","campaign","pdays","previous","poutcome", "y"]


raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?',sep=';'
                          , skipinitialspace=False, index_col=None)

Instead, it is giving me something like this:

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How can I import the dataset (link) correctly from the URL using pandas read_csv ?

>Solution :

You don’t need to set the headers. It already comes with the headers in the CSV. The reason yours looks weird, is because you are missing 3 values in your headers list, which is why it’s offset by 3.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading