Home Ignoring delimiter while reading CSV files from URLs – Python

Questions

Ignoring delimiter while reading CSV files from URLs – Python

December 29, 2023

I have some URLs for downloading CSV files.

import pandas as pd
import io
import requests

url1 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/economicoutputandproductivity/output/timeseries/' + 'k22a' + '/diop'

url2 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/economicoutputandproductivity/output/timeseries/' + 'k24c' + '/diop'

s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

When I use url1, there is a ',' in the 4th record. But some urls (url2) dont have this unexpected separator. This is causing

ParserError: Error tokenizing data. C error: Expected 1 fields in line
5, saw 2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel

when I try to merge the CSV files into a single dataframe. How do I ignore these unexpected separators. Anyway the first seven records are to be deleted. But I still get this error.

This solution suggests we pre-parse each line before converting into CSV. Since I have many such URLs, and don’t know for sure which unexpected delimiters would be encountered in future, not sure how to debug. Can pre-parsing before converting to CSV work? How to implement in such a manner to include other separators encountered in the future?

>Solution :

Since you don’t need the metadata, just skip it using the skiprows parameter of read_csv. As a nice side effect, you’ll also have the correct dtypes automatically:

url = url1
N = 7

s = requests.get(url1).content
c = pd.read_csv(io.StringIO(s.decode('utf-8')), header=0, skiprows=range(1, N+1))

Output:

  Title  IOP: C:MANUFACTURING: CVMSA
0  1948                         25.2
1  1949                         27.0
2  1950                         29.0
3  1951                         29.9
4  1952                         28.4
...

If you don’t even need headers:

url = url1
N = 8

s = requests.get(url1).content
c = pd.read_csv(io.StringIO(s.decode('utf-8')), header=None, skiprows=N)

Output:

      0     1
0  1948  25.2
1  1949  27.0
2  1950  29.0
3  1951  29.9
4  1952  28.4
...

read.csv

byMR

Published December 29, 2023

Add a comment

Angular 17 multiple content projection Error

byMR

December 29, 2023

Questions

How to align facetted pictures with facet labels using ggplot2 in R

byMR

December 29, 2023

Questions

How to remove placeholder text from MUI Autocomplete, TextField when clicked?

byMR

December 29, 2023

Questions

Location of signals in Django

byMR

December 29, 2023

Questions

My CSS is my react app isn't working as expected

byMR

December 29, 2023

Questions

Java error compilation while using checked exception

byMR

December 29, 2023

Ignoring delimiter while reading CSV files from URLs – Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Angular 17 multiple content projection Error

How to align facetted pictures with facet labels using ggplot2 in R

How to remove placeholder text from MUI Autocomplete, TextField when clicked?

Location of signals in Django

My CSS is my react app isn't working as expected

Java error compilation while using checked exception

Keep Up to Date with the Most Important News

Ignoring delimiter while reading CSV files from URLs – Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Angular 17 multiple content projection Error

How to align facetted pictures with facet labels using ggplot2 in R

How to remove placeholder text from MUI Autocomplete, TextField when clicked?

Location of signals in Django

My CSS is my react app isn't working as expected

Java error compilation while using checked exception

Discover more from Dev solutions