Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas.read_csv() How to exclude specific separtor combinations

I have a csv like:

file:

1;a;3;4
1;2;b;4
1;[a;b];3;4

Loading like pd.from_csv(file, sep=’;’)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

returns error:

ParserError: Error tokenizing data. C error: Expected 4 fields in line
3, saw 5

as the [a;b] is seen as a separator. Is there a way to exclude ; when in [ ]

Thanks

p.s. changing the file is impossible due to reasons

>Solution :

You can use ;(?![^\[]*\]) as regex separator to match only semicolons not inside brackets:

pd.read_csv(filename, sep=r';(?![^\[]*\])', engine='python')

demo:

text = '''1;a;3;4
1;2;b;4
1;[a;b];3;4
'''

import io
import pandas as pd

pd.read_csv(io.StringIO(text), sep=r';(?![^\[]*\])', engine='python')

output:

   1      a  3  4
0  1      2  b  4
1  1  [a;b]  3  4

regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading