Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Elasticsearch cardinality aggregation exception

I am currently in the process of updating ES clusters from version 6 to 7 and, in version 7 a breaking change is introduced where missing document values will throw an error.
My goal here is to alter this query and select all documents where those values exist and that should take care of my problem. How can I add a must not contain or must contain to this query to achieve my goal?

   {
       "query":{
          "bool":{
             "must":[
                {
                   "terms":{
                      "state":[
                         "pending",
                         "queued",
                         "deferred"
                      ]
                   }
                },
                {
                   "terms":{
                      "tenant_tag":[
                         "prod"
                      ]
                   }
                }
             ]
          }
       },
       "aggs":{
          "count":{
             "cardinality":{
                "script":"doc['user_id'].value + '_' + doc['campaign_id'].value"
             }
          }
       }
    }

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I would rewrite your query like this:

{
       "query":{
          "bool":{
             "filter":[
                {
                  "exists": { "field": "user_id" }
                },
                {
                  "exists": { "field": "campaign_id" }
                },
                {
                   "terms":{
                      "state":[
                         "pending",
                         "queued",
                         "deferred"
                      ]
                   }
                },
                {
                   "terms":{
                      "tenant_tag":[
                         "prod"
                      ]
                   }
                }
             ]
          }
       },
       "aggs":{
          "count":{
             "cardinality":{
                "script":"doc['user_id'].value + '_' + doc['campaign_id'].value"
             }
          }
       }
    }

Ideally, you should pre-compute the userid_campaignid field in your documents, so you don’t have to use a scripted aggregation, which are terrible in terms of performance, especially since cardinality can already be terrible itself.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading