I have an ECS service that I want to scale up and down depending on how many items are in an SQS queue.
resource "aws_cloudwatch_metric_alarm" "sqs_scale_up" {
alarm_name = "scale-up"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = "ApproximateNumberOfMessagesVisible"
namespace = "AWS/SQS"
period = "60"
threshold = "1"
statistic = "Sum"
alarm_description = "Increase task count"
insufficient_data_actions = []
alarm_actions = [aws_appautoscaling_policy.scale_up.arn]
dimensions = {
QueueName = aws_sqs_queue.this.name
}
}
resource "aws_cloudwatch_metric_alarm" "sqs_scale_down" {
alarm_name = "scale-down"
comparison_operator = "LessThanThreshold"
evaluation_periods = "1"
metric_name = "ExactNumberOfMessagesVisible"
namespace = "AWS/SQS"
period = "60"
threshold = "1"
statistic = "Sum"
alarm_description = "Decrease task count"
alarm_actions = [aws_appautoscaling_policy.scale_down.arn]
dimensions = {
QueueName = aws_sqs_queue.this.name
}
}
The fact that I have 1 alarm for count>0 and 1 alarm for count<1 means that one of these alarms will be be in the alarm state?
Is this normal?
>Solution :
Don’t panic over the word ‘ALARM‘. Instead, think of it as saying that the condition is TRUE.
If there are any messages in the queue, you presumably want to scale-out from a "nothing is running" state. Therefore, you want the scale-out alarm to be TRUE. However, you need to set a limit so that it doesn’t continually scale — it might just need one pod.
When the queue is empty, you want to scale-in. However, you don’t want to flip-flop between the two states. The general rule is "scale-out quickly, but scale-in slowly". Therefore, the rule should use a longer evaluation period before deciding to scale-in (eg 10 minutes).
Thus, there might not always be an alarm in the TRUE (ALARM) state. If there are no messages in the queue, then the scale-out alarm will be FALSE. Plus, if the sum of ExactNumberOfMessagesVisible over the previous 10 minutes is not zero, then the scale-in alarm won’t be TRUE either. Instead, both alarms will be FALSE so nothing will be changing at that time. This is good.