Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fargate task-level SNS topic subscriptions

In current state, I have a AWS Batch, which is configured to run ML training jobs. At the end of its flow, model artifacts are stored in an S3 bucket. Additionally, I have an AWS Application Load Balanced Fargate Service, which scales between 1-10 tasks, hosting a FastAPI application. I have an /update_model/ on this API, which given a model key, retrieves the most current model from S3.

Naively, I’d like Batch to terminate after sending an HTTP request to my Fargate Servive update_model method. However, the ALB will allocate this to only one task, leading to consistency issues.

Elasticache + Polling

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My first mitigation in consideration is for Batch to publish the model key to Elasticache and then each task asychronously poll Redis for model key updates; if one exists, retrieve the artifact from S3.

SNS Task-level Subscription

My second mitigation, which I’d prefer if possible is for Batch to send a notification via SNS and for each task to independently subscribe to the topic. From my research, the ALB can subscribe to the topic but I’d encounter the naive solution problem where n-1 tasks serve outdated models.

My question is: Is it possible for each Fargate Task to independently subscribe to an SNS topic (enabling fan-out)? Or am I better off using the cache polling strategy?

>Solution :

My question is: Is it possible for each Fargate Task to independently subscribe to an SNS topic (enabling fan-out)?

Only if they are each accessible directly by a public IP address, in addition to being accessible through the load balancer. They could each subscribe to SNS with their IP address on startup.

I would worry about the security implications of this approach however, as it exposes your Fargate containers directly to the Internet.


My first mitigation in consideration is for Batch to publish the model key to Elasticache and then each task asychronously poll Redis for model key updates; if one exists, retrieve the artifact from S3.

They wouldn’t even need to poll, they could use Redis pub/sub to have Redis push the new messages to them when they arrive. This would work similar to SNS, but the network traffic would be entirely contained within your VPC.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading