-
Notifications
You must be signed in to change notification settings - Fork 511
[AWS] Introduce initial alert rule templates #15346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 18 commits
db8282f
bbb5db6
94ffdb6
6ccc000
14419f7
74fd7dc
665711c
0b5ec5a
d28dd85
a0a49eb
84f231e
978d967
4cc98a2
1efc93b
6f3758b
84f98b5
5a38d36
9fefffa
ca349a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "ec2-high-cpu-utilization", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS EC2] CPU Usage High", | ||
| "tags": ["AWS EC2"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for CPU usage is >= 80, and the alerting rule is grouped by cloud account id, cloud region and instance id. You can adjust the threshold value by modifying the cpuutilization value in the WHERE clause.\nFROM metrics-aws.ec2_metrics-default\n| STATS cpuutilization=avg(host.cpu.usage*100) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE cpuutilization >= 80" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "ec2-status-check-failed", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS EC2] Status Check Failed", | ||
| "tags": ["AWS EC2"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for status check failed is > 0, and the alerting rule is grouped by cloud account id, cloud region and instance id. You can adjust the threshold value by modifying the statusfailed value in the WHERE clause.\nFROM metrics-aws.ec2_metrics-default\n| STATS statusfailed=max(aws.ec2.metrics.StatusCheckFailed.avg) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE statusfailed > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "lambda-errors", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Lambda] Errors High", | ||
| "tags": ["AWS Lambda"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for errors is > 0, and the alerting rule is grouped by cloud account id, cloud region and function name. You can adjust the threshold value by modifying the statusfailed value in the WHERE clause.\nFROM metrics-aws.lambda-default\n| STATS errors=sum(aws.lambda.Errors.avg) by cloud.account.id, cloud.region, aws.dimensions.FunctionName\n| WHERE errors > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "lambda-throttles", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Lambda] Throttles high", | ||
| "tags": ["AWS Lambda"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for throttles is > 0, and the alerting rule is grouped by cloud account id, cloud region and function name. You can adjust the threshold value by modifying the throttles value in the WHERE clause.\nFROM metrics-aws.lambda-default\n| STATS throttles=sum(aws.lambda.Throttles.avg) by cloud.account.id, cloud.region, aws.dimensions.FunctionName\n| WHERE throttles > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "sns-notifications-failed", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS SNS] Notifications Failed", | ||
| "tags": ["AWS SNS"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for notifications failed is > 0, and the alerting rule is grouped by cloud account id, cloud region and topic name. You can adjust the threshold value by modifying the notificationsfailed value in the WHERE clause.\nFROM metrics-aws.sns-default\n| STATS notificationsfailed=avg(aws.sns.NumberOfNotificationsFailed.sum) by cloud.account.id, cloud.region, aws.dimensions.TopicName\n| WHERE notificationsfailed > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "sns-notifications-filtered-out", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS SNS] Notifications Filtered Out High", | ||
| "tags": ["AWS SNS"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for notifications filtered out is > 0, and the alerting rule is grouped by cloud account id, cloud region and topic name. You can adjust the threshold value by modifying the notificationsfilteredout value in the WHERE clause.\nFROM metrics-aws.sns-default\n| STATS notificationsfilteredout=avg(aws.sns.NumberOfNotificationsFilteredOut-InvalidAttributes.sum) by cloud.account.id, cloud.region, aws.dimensions.TopicName\n| WHERE notificationsfilteredout > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "sqs-messages-visible", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS SQS] Messages Visible High", | ||
| "tags": ["AWS SQS"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for messages visible is >= 1000, and the alerting rule is grouped by cloud account id, cloud region and queue name. You can adjust the threshold value by modifying the msgsvisible value in the WHERE clause.\nFROM metrics-aws.sqs-default\n| STATS msgsvisible=max(aws.sqs.messages.visible) by cloud.account.id, cloud.region, aws.dimensions.QueueName\n| WHERE msgsvisible >= 1000" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "id": "sqs-oldest-message", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS SQS] Oldest Message Age High", | ||
| "tags": ["AWS SQS"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// The recommended threshold value for oldest message age is >= 300, and the alerting rule is grouped by cloud account id, cloud region and queue name. You can adjust the threshold value by modifying the oldestmsgage value in the WHERE clause.\nFROM metrics-aws.sqs-default\n| STATS oldestmsgage=max(aws.sqs.oldest_message_age.sec) by cloud.account.id, cloud.region, aws.dimensions.QueueName\n| WHERE oldestmsgage >= 300" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,7 @@ | ||
| format_version: 3.3.2 | ||
| format_version: 3.5.0 | ||
| name: aws | ||
| title: AWS | ||
| version: "4.4.0" | ||
| version: 4.5.0 | ||
| description: Collect logs and metrics from Amazon Web Services (AWS) with Elastic Agent. | ||
| type: integration | ||
| categories: | ||
|
|
@@ -15,7 +15,7 @@ conditions: | |
| elastic: | ||
| subscription: basic | ||
| kibana: | ||
| version: "^8.19.0 || ^9.1.0" | ||
| version: "^9.2.1" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @elastic/security-service-integrations team, This feature is supported starting from 9.2.1 release version. The minimum stack version gets upgraded to 9.2.1. Since AWS integrations involve co-ownership, Could you confirm if the stack version upgrade is fine with the integrations managed by security team? |
||
| screenshots: | ||
| - src: /img/metricbeat-aws-overview.png | ||
| title: metricbeat aws overview | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we declare which service (entity) this alert template applies to? Something like resource : aws.ec2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have included the service name in the name of the alert rule template. I suppose Kibana should allow us to filter by tags or by partial matches on the title of the alert rule template.