Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scale up and scale down cloudwatch alarms

I have an ECS service that I want to scale up and down depending on how many items are in an SQS queue.

resource "aws_cloudwatch_metric_alarm" "sqs_scale_up" {
  alarm_name = "scale-up"

  comparison_operator       = "GreaterThanOrEqualToThreshold"
  evaluation_periods        = "1"
  metric_name               = "ApproximateNumberOfMessagesVisible"
  namespace                 = "AWS/SQS"
  period                    = "60"
  threshold                 = "1"
  statistic                 = "Sum"
  alarm_description         = "Increase task count"
  insufficient_data_actions = []
  alarm_actions             = [aws_appautoscaling_policy.scale_up.arn]

  dimensions = {
    QueueName = aws_sqs_queue.this.name
  }
}

resource "aws_cloudwatch_metric_alarm" "sqs_scale_down" {
  alarm_name          = "scale-down"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "ExactNumberOfMessagesVisible"
  namespace           = "AWS/SQS"
  period              = "60"
  threshold           = "1"
  statistic           = "Sum"
  alarm_description   = "Decrease task count"
  alarm_actions       = [aws_appautoscaling_policy.scale_down.arn]

  dimensions = {
    QueueName = aws_sqs_queue.this.name
  }
}

The fact that I have 1 alarm for count>0 and 1 alarm for count<1 means that one of these alarms will be be in the alarm state?

Is this normal?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Don’t panic over the word ‘ALARM‘. Instead, think of it as saying that the condition is TRUE.

If there are any messages in the queue, you presumably want to scale-out from a "nothing is running" state. Therefore, you want the scale-out alarm to be TRUE. However, you need to set a limit so that it doesn’t continually scale — it might just need one pod.

When the queue is empty, you want to scale-in. However, you don’t want to flip-flop between the two states. The general rule is "scale-out quickly, but scale-in slowly". Therefore, the rule should use a longer evaluation period before deciding to scale-in (eg 10 minutes).

Thus, there might not always be an alarm in the TRUE (ALARM) state. If there are no messages in the queue, then the scale-out alarm will be FALSE. Plus, if the sum of ExactNumberOfMessagesVisible over the previous 10 minutes is not zero, then the scale-in alarm won’t be TRUE either. Instead, both alarms will be FALSE so nothing will be changing at that time. This is good.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading