Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sorting user-defined array with strings gives wrong order, even when file content is fully available on disk

I am querying ElasticSearch and sorting the documents locally in Bash with jq, as sorting in ES is too slow for me.

The original purpose is to create a CSV file.

But I find the sorting does not work properly, it seems sort step does nothing.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

As I am launching cURL requests, I thought the wrong order is due to content is chunked so I save some results into a local test.json file and tried again, but it still does not work.

test.json:

{
    "took": 680,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "max_score": 1.0,
        "hits": [
            {
                "_index": "my-index",
                "_type": "_doc",
                "_id": "111111113584925",
                "_score": 1.0,
                "fields": {
                    "field2": [
                        "FOO"
                    ],
                    "field1": [
                        "111111113584925"
                    ]
                }
            },
            {
                "_index": "my-index",
                "_type": "_doc",
                "_id": "111111121254059",
                "_score": 1.0,
                "fields": {
                    "field2": [
                        "FOO"
                    ],
                    "field1": [
                        "111111121254059"
                    ]
                }
            }
        ]
    }
}

(There are many more records – edited for brevity.)

Command that I use:

jq '.hits.hits[].fields | [.field1[0] + "," + .field2[0]] | sort | .[0]' -r test.json

The result:

111111113584925,FOO
111111121254059,FOO
111111116879444,FOO

etc.

Why?

Should I rely on jq sorting? Am I using it correctly? I mean I want to do string comparison by alphabetical order, and field1 all have unique values, so it will never be a tie and start to compare values of field2(it also could have various values but I only want to sort by field1)

Should I use Bash sort -k 1 instead? Which is faster when it comes to 100K rows?

>Solution :

You’re looking for something like this:

.hits.hits | map(.fields | .field1[0] + "," + .field2[0]) | sort[]

Online demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading