Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to prevent Pandas to_csv double quoting empty fields in output csv

I currently have a sample python script that reads a csv with double quotes as a text qualifier and removes ascii characters and line feeds in the fields via a dataframe. It then outputs the dataframe to a csv. However the output csv is double quoting empty fields. see below:

import pandas as pd
import re
import csv

# Load the CSV file into a DataFrame
file_path = "\\\\Mylocation\\Original_facility_udfs.csv"  # Replace with your CSV file path
df = pd.read_csv(file_path, quotechar='"')

# Define a function to clean a single cell
def clean_field(value):
    if isinstance(value, str):
        # Remove line feeds (\n, \r) and non-ASCII characters
        value = re.sub(r'[\n\r]', ' ', value)  # Replace line feeds with a space
        value = re.sub(r'[^\x00-\x7F]', '', value)  # Remove non-ASCII characters
    return value

# Apply the cleaning function to all DataFrame fields
df_cleaned = df.map(clean_field)

# Save the cleaned DataFrame back to a new CSV file
cleaned_file_path = "\\\\Mylocation\\facility_udfs.csv" # Output file path
df_cleaned.to_csv(
    cleaned_file_path,
    index=False,
    quotechar ='"',
    quoting=csv.QUOTE_ALL, # Ensure double quotes remain as text qualifiers
    lineterminator='\n'  # Set line feed (\n) as the row terminator
)

print(f"Cleaned CSV saved to: {cleaned_file_path}")

My current output is as follows

"date_key","facility_key","udf_type","udf_area_indic","udf_area"
"20240830","251","GL Unit Code","Facility for Type","",""
"20240830","251","Cost Center","Facility for Type","",""

the desired output should be

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"date_key","facility_key","udf_type","udf_area_indic","udf_area"
"20240830","251","GL Unit Code","Facility for Type",,
"20240830","251","Cost Center","Facility for Type",,

>Solution :

There is currently no direct way to do this. Python 3.12 added a csv.QUOTE_NOTNULL option, but it is currently not supported in pandas.

If you don’t have tricky strings that need proper escaping, you could "escape" the values yourself with:

(df.map(lambda x: '' if pd.isna(x) else f'"{x}"')
   .rename(columns=lambda x: f'"{x}"')
   .to_csv(cleaned_file_path,
           index=False,
           quotechar ='"', 
           quoting=csv.QUOTE_NONE,
           lineterminator='\n' 
)

Output:

"date_key","facility_key","udf_type","udf_area_indic","udf_area"
"251","GL Unit Code","Facility for Type",,
"251","Cost Center","Facility for Type",,

NB. your input was unclear, in my case "20240830" was the index, and thus absent from the output.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading