Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Further aggregate date histogram in Elasticsearch query

I’m trying to extract, for each event_type, in how many days the event occurred and the last occurrence. Currently I wrote the following query:

{
    "size": 0,
    "query": { ... },
    "_source": false,
    "aggregations": {
        "events": { 
            "terms": { "field": "event_type" },
            "aggs": {
                "event_timestamp": {
                    "date_histogram": {
                        "field": "@timestamp",
                        "calendar_interval": "day",
                        "min_doc_count": 1,
                        "order": { "_key": "desc" }
                    }
                }
            }
        }
    }
}

that extracts data to be further transformed. In my case, I just need, for each event_type, the key of the first event_timestamp bucket and the number of event_timestamp buckets. This can be done easily in a second step after this extraction, but it would be cleaner to let Elastic do it.

Max bucket (or metric) aggregation does not seems helpful, as I don’t see a suitable buckets_path in the date_histogram aggregation.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Good start!!

Here is a query that returns you the latest timestamp per event_type as well as the number of daily buckets for each event_type (for this you can use the bucket_script pipeline aggregation which simply returns the number of buckets in the date_histogram aggregation).

Also note that the filter_path query string parameter allows you to only return what matters to you, i.e. you can get rid of the date histogram that you’re not interested in:

POST test/_search?filter_path=**.latest_timestamp.hits.hits._source,**.bucket_count,**.events.buckets.key
{
  "size": 0,
  "_source": false,
  "query": {...},
  "aggregations": {
    "events": {
      "terms": {
        "field": "event_type"
      },
      "aggs": {
        "latest_timestamp": {
          "top_hits": {
            "size": 1,
            "sort": {"@timestamp": "desc"}, 
            "_source": ["@timestamp"]
          }
        },
        "event_timestamp": {
          "date_histogram": {
            "field": "@timestamp",
            "calendar_interval": "day",
            "min_doc_count": 1,
            "order": {
              "_key": "desc"
            }
          }
        },
        "bucket_count": {
          "bucket_script": {
            "buckets_path": {
              "count": "event_timestamp._bucket_count"
            },
            "script": "params.count"
          }
        }
      }
    }
  }
}

You’ll get for each event type something that looks like this:

    {
      "key": "2",                            <--- event type
      "latest": {
        "hits": {
          "hits": [
            {
              "_source": {
                "@timestamp": "2023-01-05"   <--- latest timestamp
              }
            }
          ]
        }
      },
      "bucket_count": {
        "value": 4                           <--- bucket count
      }
    },
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading