Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to make a CSV file with arrays of varchar containing double quotes for PostgreSQL COPY

I am preparing a CSV for loading onto a table using the SQL command:

COPY table FROM filename WITH (FORMAT csv, HEADER)

This particular table has a varchar[] (array of strings) column, and some of the items in these arrays contain special characters such as double quotes (") and backslashes (\), some of which can break the CSV format or the array notation. Examples:

"quote"
x1"x2
C:\Users

I’ve made a few attempts so far (attempt is the actual content of the column in the CSV file):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{"""quote""","x1""x2","C:\Users"}
ERROR:  extra data after last expected column
CONTEXT:  COPY table, line 1: "{"""quote""","x1""x2","C:\Users"}"

"{"""quote""","x1""x2","C:\Users"}"
Inserted value is missing characters:
{quote,x1x2,C:Users}

"{"\""quote\""","x1\""x2","C:\\Users"}"
Inserted value is missing characters:
{"quote\"",x1x2,"C:\\Users"}

What is the correct syntax to insert values with double quotes and backslashes from a CSV?

PostgreSQL version is 13.

>Solution :

The varchar[] column must be formatted using a CSV-formatted version of PostgreSQL array notation. Unfortunately, as my example shows, this can get quite hairy.

Here are some examples how the column should be quoted:

copy (select
    'other-column'::text, 
    '{foo,bar,baz,Lucky''s Bar,"dbl\"quote",literal-backslash\\,brackets\[\]\{\}}'::varchar[],
    'another-column'::text
) to stdout with (format csv);
other-column,"{foo,bar,baz,Lucky's Bar,""dbl\""quote"",""literal-backslash\\"",""brackets[]{}""}",another-column

PostgreSQL has this to say on quoting array elements:

[W]hen writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type’s delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.

As with all CSV documents, if your column data contains quotes, spaces, commas, or other special characters, the column must be CSV-quoted. Or just always quote them to be safe. Normal CSV output libraries will auto-quote columns in your data.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading