So, this is our use case:
We have a ingest proccess where we load around 1000 elements daily into an ES index. One of the fields of that index is the city name, but we would like to elaborate some maps with elements by city and we don’t have geolocation of those cities.
Usually our data cities are going to be from a very limited collection BUT (and this is a big but) we can get from time to time new cities from totally unexpected places. So we don’t need exactly an index with every city in the world and it’s geolocation (as you can get from geonames) but we will surely have to check from time to time the geolocation of several cities.
Put this, my approach is the following:
I’d like to add into our logstash ETL proccess a query that looks for the city in an ES index, if the city is there, it could get it’s geolocation from this city index, if not, I want to query geonames API looking for the geolocation of the city, and store it in our ES city index. This way we’ll only have to query the API from time to time with new cities, and once a city is ingested into our cities index we don’t have to query for it again.
Is this a good approach for ELK? Or there is a better approach I’m not seeing? Keep in mind that I’m pretty newbie in ELK stack.
>Solution :
Your approach makes sense for your use case, and it’s a reasonable way to handle geolocation data in Elasticsearch and the ELK (Elasticsearch, Logstash, and Kibana) stack. Here are some considerations and steps to help you implement it:
-
Create a Cities Index:
Start by creating an index in Elasticsearch to store your city geolocation data. This index can contain fields such ascity_name,latitude,longitude, etc. -
Ingest Cities Data:
Populate this index with the geolocation data for cities that you already know. You can manually input this data or use an automated process. -
Logstash ETL Process:
In your Logstash ETL process, when you encounter a new element with a city name, perform a lookup in your Cities Index. If the city is found, use its geolocation. If not, query the geonames API to get the geolocation and then store it in your Cities Index.Example Logstash Configuration:
filter { if ![geo_location] { elasticsearch { hosts => ["your_elasticsearch_host"] index => "cities" query => "city_name:%{[city_field]}" fields => { "geo_location" => "geo_location" } add_field => { "city_found" => "true" } } } } output { if [city_found] != "true" { http { url => "https://api.geonames.org/searchJSON?q=%{[city_field]}&username=your_geonames_username" verb => "GET" target_body => "[geo_location]" } elasticsearch { hosts => ["your_elasticsearch_host"] index => "cities" document_id => "%{[city_field]}" } } }Adjust the configuration according to your actual field names, URLs, and API keys.
-
Periodic Updates:
Set up a periodic job to check for new cities and update your Cities Index. This can be done through a scheduled task using tools like cron or a job scheduler. -
Index Template:
Consider defining an index template for your Cities Index to ensure consistent mapping for future documents.
Keep in mind that the geonames API may have rate limits, so be mindful of how often you query it. Additionally, ensure that you handle errors gracefully in case the API is unreachable or returns unexpected responses.
This approach allows you to maintain a local index of known cities’ geolocations while dynamically adding new cities as needed. It’s a practical solution for handling both expected and unexpected city data.