-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add error metrics tracking for workflow incidents #6622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop/3.5.0
Are you sure you want to change the base?
Conversation
Introduce WorkflowErrorSpanHandler interface and context for improved error span handling in workflows. Separate activity and workflow error handling using dedicated abstractions and context models. Update error handling logic in tracing middleware and adjust DI configuration accordingly.
This commit introduces a new `ErrorMetrics` class to track and record workflow errors, leveraging OpenTelemetry. It replaces `DefaultErrorSpanHandler` with `DefaultExceptionHandler` and adds support for custom metric handling through the `IErrorMetricHandler` interface. Additionally, the middleware and service configuration have been updated to integrate error metrics tracking seamlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces error metrics tracking into workflow incident handling by replacing the old error span handler with a new exception handler and adding support for custom error metric handlers using OpenTelemetry. Key changes include:
- Adding the new ErrorMetrics class and IErrorMetricHandler interface for tracking workflow errors.
- Updating middleware and DI registrations to integrate error metric tracking.
- Renaming and consolidating error span handlers to align with the new behavior.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/modules/Elsa.OpenTelemetry/Models/ErrorMetricContext.cs | Introduces a new context model for error metrics using a modern constructor syntax. |
| src/modules/Elsa.OpenTelemetry/Middleware/OpenTelemetryTracingActivityExecutionMiddleware.cs | Updates the middleware constructor and tag assignments, and integrates error tracking. |
| src/modules/Elsa.OpenTelemetry/Metrics/ErrorMetrics.cs | Implements the error metrics tracking mechanism with OpenTelemetry counters. |
| src/modules/Elsa.OpenTelemetry/Handlers/FaultExceptionHandler.cs | Renames and extends the fault exception handler to process error metric contexts. |
| src/modules/Elsa.OpenTelemetry/Handlers/DefaultExceptionHandler.cs | Introduces a unified exception handler covering span and metric contexts. |
| src/modules/Elsa.OpenTelemetry/Features/OpenTelemetryFeature.cs | Updates DI registrations to incorporate the new error metric tracking components. |
| src/modules/Elsa.OpenTelemetry/Contracts/IErrorMetricHandler.cs | Adds an interface to support customizable error metric handling. |
| src/apps/Elsa.Server.Web/Program.cs | Adjusts metrics configuration to register the new error metrics and updates exporter settings. |
Comments suppressed due to low confidence (2)
src/modules/Elsa.OpenTelemetry/Middleware/OpenTelemetryTracingActivityExecutionMiddleware.cs:33
- Ensure that using activity.Id instead of activity.NodeId for the 'activity.id' tag is consistent with the identifier usage elsewhere in the system.
span.SetTag("activity.id", activity.Id);
src/apps/Elsa.Server.Web/Program.cs:157
- [nitpick] Verify that the instrumentation lines commented out are intentionally disabled; if so, consider adding a comment to clarify their purpose or remove them if no longer necessary.
.AddMeter(ErrorMetrics.MeterName)
| .AddScoped<IWorkflowErrorSpanHandler, DefaultExceptionHandler>() | ||
| .AddScoped<IWorkflowErrorSpanHandler, FaultExceptionHandler>() | ||
| .AddScoped<IErrorMetricHandler, DefaultExceptionHandler>() | ||
| .AddScoped<IErrorMetricHandler, FaultExceptionHandler>() |
Copilot
AI
Apr 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Include a brief comment explaining the purpose of the ErrorMetrics registration to aid future maintainability.
| .AddScoped<IErrorMetricHandler, FaultExceptionHandler>() | |
| .AddScoped<IErrorMetricHandler, FaultExceptionHandler>() | |
| // Registers the ErrorMetrics service to track and report error-related metrics for OpenTelemetry. |
Renamed `FaultExceptionActivityErrorSpanHandler` to `FaultExceptionErrorSpanHandler` for consistency and clarity. Updated error attribute tagging to align with Datadog's well-known attributes. Adjusted incident selection logic to use the first incident instead of the last.
Added well-known Datadog attributes to ensure proper handling of fault exception tags. Introduced a separate object for unknown attributes to prevent them from being ignored by Datadog.
Previously, the middleware selected the last incident in the list, which could lead to incorrect behavior. This change ensures the middleware selects the first incident, aligning with the expected logic.
Standardized comments in the .gitignore file by adding missing spaces and capitalizing as needed. Updated the `/docker/data/` entry to `/docker/azurite-data/` for clarity.
Replaced `LastOrDefault` with `FirstOrDefault` to ensure the correct activity execution context is retrieved when handling errors. This change resolves potential inaccuracies in identifying the faulted activity node.
This commit introduces a new
ErrorMetricsclass to track and record workflow errors, leveraging OpenTelemetry. It replacesDefaultErrorSpanHandlerwithDefaultExceptionHandlerand adds support for custom metric handling through theIErrorMetricHandlerinterface. Additionally, the middleware and service configuration have been updated to integrate error metrics tracking seamlessly.This change is