DEV Community

Cover image for Monitor multiple resources using a single CloudWatch Alarm (with CDK)
Johannes Konings for AWS Community Builders

Posted on • Originally published at johanneskonings.dev on

Monitor multiple resources using a single CloudWatch Alarm (with CDK)

Introduction

CloudWatch Metrics Insights query alarms (aka “multi-metric alarms”) let one alarm evaluate many individual resources. You write a Metrics Insights SQL query (with GROUP BY), and CloudWatch keeps the alarm’s contributor set up to date as resources are created/deleted.

Before this announcement, you had two imperfect choices:

  • Create one alarm per resource (granular, but high maintenance as resources change).
  • Use aggregated metrics (low maintenance, but you lose per-resource visibility and actions).

The September 2025 feature (https://aws.amazon.com/about-aws/whats-new/2025/09/amazon-cloudwatch-alarm-multiple-metrics/) adds query alarms that evaluate many individual metrics with one alarm, preserving per-resource visibility and actions while removing the per-resource alarm sprawl.

Pubudu Jayawardana already described how to use it here https://pubudu.dev/posts/cloudwatch-multi-metric-alarms/, this post focuses on how to implement the pattern in AWS CDK with concise, working examples (SQS DLQs with tag-based queries, and SFN across all state machines).

Account Configuration

If you want WHERE tag.X = 'Y' filters in Metrics Insights queries, enable resource tags on telemetry: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/EnableResourceTagsOnTelemetry.html

cloudwatch settings tags

As of 2025-12, this setting isn’t exposed as a CloudFormation resource. CloudWatch may take up to a few hours to fully enrich all metrics after enabling.

CLI equivalent:

aws observabilityadmin start-telemetry-enrichment

Enter fullscreen mode Exit fullscreen mode

If you want this in CDK, use a Lambda-backed custom resource that calls the same API:

// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/observabilityadmin/command/StartTelemetryEnrichmentCommand/
// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/EnableResourceTagsOnTelemetry.html
new AwsCustomResource(this, "StartTelemetryEnrichment", {
  onCreate: {
    service: "@aws-sdk/client-observabilityadmin",
    action: "StartTelemetryEnrichment",
    physicalResourceId: PhysicalResourceId.of("StartTelemetryEnrichment"),
  },
  installLatestAwsSdk: true,
  // policy: AwsCustomResourcePolicy.fromSdkCalls({ resources: AwsCustomResourcePolicy.ANY_RESOURCE }),
  policy: AwsCustomResourcePolicy.fromStatements([
    new PolicyStatement({
      effect: Effect.ALLOW,
      actions: [
        "observabilityadmin:StartTelemetryEnrichment",
        "iam:CreateServiceLinkedRole",
        "resource-explorer-2:CreateIndex",
        "resource-explorer-2:CreateManagedView",
        "resource-explorer-2:CreateStreamingAccessForService",
      ],
      resources: ["*"],
    }),
  ]),
});

Enter fullscreen mode Exit fullscreen mode

cloudwatch settings tags activated

Example 1: one alarm for all DLQs (tag filtered)

Tag only the DLQs you want included (opt-in):

const dlq1 = new Queue(this, "DLQ1");
Tags.of(dlq1).add("isDlq", "true");
Tags.of(dlq1).add("monitor", "true");
new Queue(this, "Queue1", {
  deadLetterQueue: {
    queue: dlq1,
    maxReceiveCount: 3,
  },
});

const dlq2 = new Queue(this, "DLQ2");
Tags.of(dlq2).add("isDlq", "true");
Tags.of(dlq2).add("monitor", "true");
new Queue(this, "Queue2", {
  deadLetterQueue: {
    queue: dlq2,
    maxReceiveCount: 5,
  },
});

const dlq3 = new Queue(this, "DLQ3");
Tags.of(dlq3).add("isDlq", "true");
Tags.of(dlq3).add("monitor", "false");
new Queue(this, "Queue3", {
  deadLetterQueue: {
    queue: dlq3,
    maxReceiveCount: 5,
  },
});


const expressionDlq = new MathExpression({
  expression: `SELECT MAX(ApproximateNumberOfMessagesVisible)
            FROM SCHEMA("AWS/SQS", QueueName)
            WHERE tag.monitor = 'true'
            AND tag.isDlq = 'true'
            GROUP BY QueueName
            ORDER BY COUNT() DESC`,
  usingMetrics: {},
  period: Duration.minutes(1),
  label: "DLQ messages",
});

new Alarm(this, "DlqAlarm", {
  metric: expressionDlq,
  threshold: 1,
  evaluationPeriods: 1,
  comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
  alarmDescription: "Alarm if any tagged DLQ has visible messages",
  alarmName: "DLQ-Alarm",
});

Enter fullscreen mode Exit fullscreen mode

cloudwatch dlq alarm

cloudwatch dlq ok

Example 2: one alarm for failed Step Functions executions

Tag filtering only works for resource types supported by resource tags for telemetry (SQS is supported; Step Functions isn’t listed): https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingResourceTagsForTelemetry.html#SupportedAWSInfrastructureMetrics

So for Step Functions, this alarms across all state machines in the account/region.

const sfn1 = new StateMachine(this, "StateMachine1", {
  definitionBody: DefinitionBody.fromChainable(new Fail(this, "FailState1")),
});

const sfn2 = new StateMachine(this, "StateMachine2", {
  definitionBody: DefinitionBody.fromChainable(new Fail(this, "FailState2")),
});


const expressionSfn = new MathExpression({
  expression: `SELECT MAX(ExecutionsFailed)
            FROM SCHEMA("AWS/States", StateMachineName)
            GROUP BY StateMachineName
            ORDER BY COUNT() DESC`,
  usingMetrics: {},
  period: Duration.minutes(1),
  label: "Sfn failed executions",
});

new Alarm(this, "SfnAlarm", {
  metric: expressionSfn,
  threshold: 1,
  evaluationPeriods: 1,
  comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
  alarmDescription: "Alarm if any state machine has failed executions",
  alarmName: "SFN-Alarm",
});

Enter fullscreen mode Exit fullscreen mode

cloudwatch sfn alarm

cloudwatch sfn ok

Sources

Top comments (0)