JQ oneliner to add headers to CSV with funky spreadsheet row/column nested arrays?

My student information system outputs custom lists via a spreadsheet row/column JSON with column headers as the value of the "name" key and data as the value of the "value" key. The format is:

{"count":804,"page":1,"results":{"rows":[{"columns":[{"name":"Student Host ID","value":"1234567890     "},{"name":"Student ID","value":"12345"},{"name":"Student Created","value":"04/27/2017 16:29:40"}]},{"columns":[{"name":"Student Host ID","value":"2345678901     "},{"name":"Student ID","value":"23456"},{"name":"Student Created","value":"04/27/2017 16:29:50"}]}]}}

I would like it to output as CSV like:

Student Host ID,Student ID,Student Created
"1234567890","12345","04/27/2017 16:29:40"
"2345678901","23456","04/27/2017 16:29:50"

(but I’m OK with quoted header field names if JQ is quote-happy.) I know I can prepend actual field names in the JQ command, but I have a bunch of exports with different fields, and the actual exports can contain more than 50 fields, so I’d rather it if JQ worked out the headers automagically so I can use the same command with different files.

Thus far, I have the following to pull the data as CSV (the gsub gets rid of extra whitespace in some exports that seems to get added on some fields but not others for some bizarre reason):

jq -r '.results.rows[] | [.columns[].value] | map(if type=="string" then gsub("^\\s+|\\s+$";"") else . end) | @csv'

This results in the CSV data:

"1234567890","12345","04/27/2017 16:29:40"
"2345678901","23456","04/27/2017 16:29:50"

I currently grab the field names for the header with:

jq -r '.results.rows[0].columns[] | .name' | sed 's/^\|$/"/g'|paste -sd, - | > export.csv

and then run the previous JQ command to >> (append) the data to the same export.csv file.

I’ve seen commands like https://gist.github.com/gigkokman/bf3b305f1073c176403958628419bb2b that use mapping to add fieldnames automatically, but I can’t wrap my mind around how to decipher the command and apply it to my nested row/column JSON.

I’d really appreciate any suggestions you might have – thank you!

>Solution :

Putting together your snippets like so (fetching the header names from the first row):

jq -r '
  [.results.rows[0].columns[].name], (
    .results.rows[] | [.columns[].value]
    | map(if type=="string" then gsub("^\\s+|\\s+$";"") else . end)
  )
  | @csv
'
"Student Host ID","Student ID","Student Created"
"1234567890","12345","04/27/2017 16:29:40"
"2345678901","23456","04/27/2017 16:29:50"

Leave a Reply