-
Notifications
You must be signed in to change notification settings - Fork 511
[AWS] Introduce initial alert rule templates #15346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
💚 CLA has been signed |
ef16f46 to
bbb5db6
Compare
|
@gpop63 : The template will be usable from 9.2 onwards . Can you please share a screenshot of how a particular alert looks like. Also, are we not adding any information about alert support in the README's ? |
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "EC2 High CPU Utilization", | ||
| "tags": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What tags were you thinking of, Muthu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tags can have the service name and the Alert metrics name. Similar to what I have added here in Azure AI Foundry.
e.g., [AWS EC2, AWS EC2 CPU Utilization].
| @@ -0,0 +1,37 @@ | |||
| { | |||
| "id": "b6513de4-6c36-499a-8f0a-98431cd4dbee", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the id match with the file name of the rule_template?
Error: defines non-matching ID
|
@ishleenk17 right now the support is not fully there we only see them under assets and in saved objects
|
| "groupBy": "all", | ||
| "termSize": 5, | ||
| "sourceFields": [], | ||
| "timeField": "event.ingested", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the time field be @timestamp? Is there a reason for choosing event.ingested instead of @timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using the @timestamp field but it wasn't generating alerts. For some AWS data streams @timestamp is when the actual metric happened in AWS.
| "esql": "FROM metrics-aws.ec2_metrics-default\n| STATS cpuutilization=avg(aws.ec2.metrics.CPUUtilization.avg) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE cpuutilization >= 80" | ||
| }, | ||
| "aggType": "count", | ||
| "groupBy": "all", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this groupBy not applicable while using ESQL query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The group by of actual data happens in the esql query itself, this has to be a property of the alert.
| "thresholdComparator": ">", | ||
| "size": 100, | ||
| "esqlQuery": { | ||
| "esql": "FROM metrics-aws.ec2_metrics-default\n| STATS cpuutilization=avg(aws.ec2.metrics.CPUUtilization.avg) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE cpuutilization >= 80" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying dataset filter help fetch only the specific data for the alerting metrics. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we do that? also this esql query targets documents from a specific data stream/index (metrics-aws.ec2_metrics-default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ignore this as we directly target against specific datastream.
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "threshold": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to groupby. Check whether the threshold value is applied directly from ESQL query and not from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold is set in the esql query, this is a different property of the alert.
Co-authored-by: Dan Kortschak <[email protected]>
🚀 Benchmarks reportTo see the full report comment with |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we declare which service (entity) this alert template applies to? Something like resource : aws.ec2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have included the service name in the name of the alert rule template. I suppose Kibana should allow us to filter by tags or by partial matches on the title of the alert rule template.
| "thresholdComparator": ">", | ||
| "size": 100, | ||
| "esqlQuery": { | ||
| "esql": "FROM metrics-aws.ec2_metrics-default\n| STATS cpuutilization=avg(aws.ec2.metrics.CPUUtilization.avg) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE cpuutilization >= 80" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field aws.ec2.metrics.CPUUtilization.avg is renamed here. Do you think this field should be changed to host.cpu.usage field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both metrics are present but switching to the ECS field seems like the better option. Switched to host.cpu.usage in d28dd85. I had to multiply it by 100 to use percentages.
MichelLosier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration looks good to me!
| "schedule": { | ||
| "interval": "1m" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is applicable for all the configurations.
Should we keep this so frequently? I suggest, this be equal to the default period value for metrics ingestion. Following so, it helps to avoid any no-data found alert (when user decides to extend the configuration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set timeWindowSize to match the integration period? That way, for example, every 5 minutes we’d check for alerts in documents from the past 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, thats a resonable thing to do. The impact I assume here will be that instead of an alert being notified at the period + 1m interval, the alert will be notified at 2 x period internal. Here period is 5m for most AWS servies.
@tommyers-elastic , what would be your recommendation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we have any way to couple configs in agent policy templates with these rule configurations, so whatever we choose will have to be always added by hand.
my only thinking here is that it doesn't make sense to run a rule more frequently than the integration collection period. matching the rule frequency with the collection period seems sensible to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a shame there's no way to put hints in the form such that we could have something that shows up and says "should match the integration collection period" or something. if we think it's worthwhile we could suggest this as a feature.
| subscription: basic | ||
| kibana: | ||
| version: "^8.19.0 || ^9.1.0" | ||
| version: "^9.2.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elastic/security-service-integrations team, This feature is supported starting from 9.2.1 release version. The minimum stack version gets upgraded to 9.2.1. Since AWS integrations involve co-ownership, Could you confirm if the stack version upgrade is fine with the integrations managed by security team?
|
/test |
💔 Build Failed
Failed CI StepsHistory
cc @gpop63 |



Overview
This PR introduces the first set of alert rule templates for key AWS data streams. For each stream, we selected the two most critical metrics to monitor.
ec2_metricslambdasqssnsChecklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Related issues
Screenshots