-
Notifications
You must be signed in to change notification settings - Fork 480
Description
Problem Statement
Telemetry is leaving idle database connections open, causing resource leakage across all Evergreen customers.
The issue started after the deployment of dotcms/dotcms:26.01.02-01_b8b3476 and impacts all customers running Evergreen environments.
This behavior increases the risk of connection pool exhaustion and overall platform instability.
As an immediate workaround, telemetry has been disabled for affected environments, which reduces observability and is not a sustainable long-term solution.
Initial investigation points to telemetry collectors related to experiments, specifically:
CountVariantsInAllArchivedExperimentsMetricType.
More context and operational impact details are available in Slack under #eng-critical-incidents.
Steps to Reproduce
- Deploy or run an Evergreen environment using image dotcms/dotcms:26.01.02-01_b8b3476 or newest.
- Ensure telemetry is enabled (default configuration).
- Allow Telemetry job runs, you can change the schedule by using:
TELEMETRY_SAVE_SCHEDULE - Observe database connection pool metrics.
SELECT pid,datname AS database,usename AS user, state, backend_start AS connection_established, state_change AS last_activity, now() - state_change AS idle_duration, query AS last_query FROM pg_stat_activity WHERE state IN ('idle', 'idle in transaction') ORDER BY idle_duration DESC;
- Notice idle connections remaining open and not being released over time.
- Disable telemetry and observe that the idle connection behavior stops.
Acceptance Criteria
• Telemetry collectors do not leave idle database connections open.
• Connection lifecycle is properly managed and connections are released after telemetry execution.
• The collector CountVariantsInAllArchivedExperimentsMetricType is reviewed and corrected if necessary.
• Telemetry can remain enabled in Evergreen environments without causing connection leaks.
• Fix is validated in Evergreen and confirmed not to regress other telemetry metrics.
dotCMS Version
Everything after 26.01.02-01_b8b3476
Severity
High - Major functionality broken
Links
https://dotcms.slack.com/archives/C096E0QAM7G/p1770055843362409
Metadata
Metadata
Assignees
Labels
Type
Projects
Status