Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas.read_csv is ignoring quoting of strings

I am having some trouble reading/importing a csv file into a pandas dataframe. The import is not skipping the comma that is enclosed in quotes.

I have tried different options for quotechar but none made any difference

import csv
import pandas

df = pandas.read_csv( 'test_quote.csv', header=None,sep=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL, encoding='ascii', engine='python')
print(df)
code output 
$ python3 test_quote.py 
        0     1              2       3                            4       5       6
0  201571  2080    "December 2   2022"    "November 1 - November 30   2022"  487.29
1  345741  5377    "December 3   2022"    "November 1 - November 30   2022"  729.35
2  995349  3672   "December 2    2022"   "November 1 - November 30    2022"  937.33
3  475601  3672   "December 2    2022"   "November 1 - November 30    2022"  790.17
4  228548  3672    "December 7   2022"    "November 1 - November 30   2022"  682.38

expected output
$ python3 test_quote.py 
        0     1                     2                                   3       4
0  201571  2080    "December 2, 2022"    "November 1 - November 30, 2022"  487.29
1  345741  5377    "December 3, 2022"    "November 1 - November 30, 2022"  729.35
2  995349  3672   "December 2 , 2022"   "November 1 - November 30 , 2022"  937.33
3  475601  3672   "December 2 , 2022"   "November 1 - November 30 , 2022"  790.17
4  228548  3672    "December 7, 2022"    "November 1 - November 30, 2022"  682.38

input file = test_quote.csv
201571, 2080, "December 2, 2022", "November 1 - November 30, 2022", 487.29
345741, 5377, "December 3, 2022", "November 1 - November 30, 2022", 729.35
995349, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 937.33
475601, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 790.17
228548, 3672, "December 7, 2022", "November 1 - November 30, 2022", 682.38

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The extra spaces after the commas are causing the issue. Use the following, but note most of your parameters are already the defaults.

import csv
import pandas 

df = pandas.read_csv( 'test_quote.csv', header=None, skipinitialspace=True)
print(df)

Output:

        0     1                  2                                3       4
0  201571  2080   December 2, 2022   November 1 - November 30, 2022  487.29
1  345741  5377   December 3, 2022   November 1 - November 30, 2022  729.35
2  995349  3672  December 2 , 2022  November 1 - November 30 , 2022  937.33
3  475601  3672  December 2 , 2022  November 1 - November 30 , 2022  790.17
4  228548  3672   December 7, 2022   November 1 - November 30, 2022  682.38
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading