317 Add Observability to PDP and OE #408

ginaxu1 · 2026-01-03T14:26:24Z

Summary

This PR adds observability (metrics collection and monitoring) to OE and PDP. Additionally, OE, PDP, CE all support optional observability via environment variables:

ENABLE_OBSERVABILITY=false - Disables observability
OTEL_METRICS_ENABLED=false - Disables OpenTelemetry metrics

When observability is disabled:

Services continue to function normally
/metrics endpoint returns "# Metrics disabled" or "# Metrics not initialized"
HTTP middleware passes through requests without recording metrics
Business event recording is skipped (no-op)
External call metrics are skipped (no-op)

Endpoints:

Prometheus Targets: http://localhost:9091/targets
Prometheus Graph: http://localhost:9091/graph
Grafana Dashboard: http://localhost:3002/d/go-services-dashboard/go-services-metrics (user: admin, pw: admin)

OE Metrics: http://localhost:4000/metrics
PDP Metrics: http://localhost:8082/metrics

Metrics Captured

HTTP Request Metrics: http_requests_total (counter), http_request_duration_seconds (histogram)
Business Events (OE): business_events_total - GraphQL outcomes
External Calls (OE): external_call_duration_seconds (histogram), external_calls_total (counter)
Go Runtime: GC, memory, goroutines
Process: CPU, memory, network I/O

Labels: method, route, status, service_name, provider_key, call_type, event_type, outcome

Type of Change

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Other (please describe):

Testing with Observability Enabled (Default)

Start Services

cd exchange
export DB_PASSWORD=${DB_PASSWORD:-password}
docker compose up -d pdp-db ce-db
sleep 5  # Wait for databases
docker compose up -d orchestration-engine policy-decision-point consent-engine

Note: If consent-engine fails, rebuild: docker compose build consent-engine && docker compose up -d consent-engine

Verify Metrics

# Should show http_requests_total
curl http://localhost:4000/metrics | grep http_requests_total | head -2
curl http://localhost:8081/metrics | grep http_requests_total | head -2
curl http://localhost:8082/metrics | grep http_requests_total | head -2

Verify Health

curl http://localhost:4000/health  # Should return 200 OK
curl http://localhost:8081/health  # Should return 200 OK
curl http://localhost:8082/health  # Should return 200 OK

Testing with Observability Disabled

Start Services with Observability Disabled

cd exchange
export ENABLE_OBSERVABILITY=false
export DB_PASSWORD=${DB_PASSWORD:-password}
docker compose down  # Important: recreate containers to pick up env vars
docker compose up -d pdp-db ce-db
sleep 5
docker compose up -d orchestration-engine policy-decision-point consent-engine

Verify Metrics Disabled

# Should NOT show http_requests_total (only Go runtime metrics)
curl http://localhost:4000/metrics | grep -E "(http_requests_total|Metrics disabled)" | head -2
curl http://localhost:8082/metrics | grep -E "(http_requests_total|Metrics disabled)" | head -2

Verify Services Still Work

curl http://localhost:4000/health  # Should still work
curl http://localhost:8082/health   # Should still work

Verify Logs

export DB_PASSWORD=password
docker compose logs orchestration-engine | grep -i observability
docker compose logs policy-decision-point | grep -i observability
# Should show: "Observability disabled via environment variable"

Troubleshooting

Service Fails to Start

If a service fails to start when observability is disabled, check:

Service logs for initialization errors
Ensure ENABLE_OBSERVABILITY=false is set correctly
Verify monitoring initialization is wrapped in if monitoring.IsObservabilityEnabled()

Metrics Endpoint Returns 503

This indicates metrics were not initialized. Check:

Service logs for initialization errors
Ensure observability is enabled or explicitly disabled
Verify monitoring.Initialize() was called (if observability enabled)

Services Work But No Metrics

Check:

Prometheus is scraping the correct endpoints
Services are exposing /metrics endpoint
HTTPMetricsMiddleware is applied to routes
Observability is enabled (ENABLE_OBSERVABILITY=true)

Deployment

Environment Variables

OE: SERVICE_NAME, OTEL_METRICS_EXPORTER, CONFIG_PATH
PDP: SERVICE_NAME, OTEL_METRICS_EXPORTER, CHOREO_OPENDIF_DATABASE_* (with defaults)
Database: POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB (with defaults)

Verification

# Use verification script
cd observability && ./verification_steps.sh

Rollback Plan

To disable observability:

Remove monitoring.Initialize() calls from main.go files
Remove /metrics endpoint registrations
Remove HTTPMetricsMiddleware wrappers
Remove RecordBusinessEvent() and RecordExternalCall() calls

Note: Services continue to function normally without metrics collection.

gemini-code-assist · 2026-01-03T14:26:56Z

Summary of Changes

Hello @ginaxu1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the observability of the orchestration-engine and policy-decision-point Go services by integrating a dedicated monitoring package. The changes introduce comprehensive metrics for HTTP request handling, external service interactions, and critical business events, providing a clearer picture of application health and performance. Complementary updates to Grafana dashboards and Prometheus configuration ensure these new metrics are effectively collected and visualized, while also streamlining the local development environment.

Highlights

Enhanced Observability: Integrated a shared monitoring package into the orchestration-engine and policy-decision-point Go services, enabling the collection of various metrics.
Detailed HTTP Metrics: Implemented middleware to automatically capture HTTP request duration, method, and route, providing granular insights into API performance.
External Call Tracking: Added instrumentation to the orchestration-engine to record the duration and outcome of external HTTP calls made to other services.
Business Event Monitoring: Introduced specific metrics for key business events, such as the success or failure of GraphQL requests, to track application logic outcomes.
Monitoring Stack Updates: Updated Grafana dashboards and Prometheus configuration to correctly ingest, visualize, and scrape the newly exposed metrics, facilitating local development.
Go Module Path Refactoring: Standardized Go module paths for orchestration-engine and policy-decision-point and managed new dependencies for the monitoring package.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces comprehensive monitoring capabilities to the orchestration-engine and policy-decision-point services by integrating a new shared pkg/monitoring module. This involves updating go.mod and go.sum files to include new dependencies, modifying main.go in both services to initialize monitoring, and adding HTTP metrics middleware and /metrics endpoints. Specifically, the orchestration-engine now records external call durations and GraphQL request outcomes, while the policy-decision-point sets a SERVICE_NAME environment variable for metric labeling. Import paths across several files in the orchestration-engine were also updated to reflect a module name change. In the Grafana dashboard, changes were made to Prometheus query labels for HTTP requests and P95 latency, and a 'Service Health' panel was removed. Prometheus configuration was updated to use host.docker.internal for scraping several services and to rename the portal-backend job to api-server-go. Review comments highlight concerns about the loss of per-route P95 latency data in the Grafana dashboard, the removal of the 'Service Health' panel, an inconsistency in the Prometheus scrape configuration for api-server-go (where the target hostname doesn't match the job name), and suggest removing a redundant logger initialization check in orchestration-engine/server/server.go.

observability/grafana/dashboards/go-services-metrics.json

observability/prometheus/prometheus.yml

exchange/orchestration-engine/server/server.go

ginaxu1 · 2026-01-05T05:08:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive observability features, specifically metrics, to the orchestration-engine and policy-decision-point services. It integrates a new shared monitoring package by updating go.mod and go.sum files in both services, adding numerous OpenTelemetry-related dependencies. Both services now initialize monitoring at startup, apply HTTP metrics middleware to their routers, and expose a /metrics endpoint. The orchestration-engine further records durations for external provider calls and tracks the outcome (success/failure) of GraphQL requests as business events. Review comments highlighted an incorrect label (http_method instead of http_request_method) in the Grafana dashboard's PromQL queries for HTTP metrics, an inconsistent log.Printf call in orchestration-engine/main.go that should use logger.Log.Warn, and a minor formatting issue with extra leading spaces in a log statement within orchestration-engine/server/server.go. Additionally, Prometheus configuration was updated to use host.docker.internal for service targets and renamed the portal-backend job to api-server-go, while Grafana provisioning files were adjusted for dashboard editability and datasource configuration.

observability/grafana/dashboards/go-services-metrics.json

exchange/orchestration-engine/main.go

exchange/orchestration-engine/server/server.go

ginaxu1 · 2026-01-05T05:23:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces observability features to the Orchestration Engine and Policy Decision Point services by integrating OpenTelemetry for metrics collection. The changes are well-structured, including middleware for HTTP metrics, custom metrics for business events and external calls, and updated configurations for Prometheus and Grafana. My review focuses on improving security by removing hardcoded credentials and enhancing code clarity and maintainability.

exchange/docker-compose.yml

exchange/orchestration-engine/provider/provider.go

exchange/policy-decision-point/main.go

ginaxu1 · 2026-01-05T06:32:20Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive observability features to the Orchestration Engine and Policy Decision Point services, which is a great step towards improving monitoring and reliability. The use of OpenTelemetry for vendor-agnostic metrics, non-blocking initialization, and the addition of a verification script are all excellent practices.

However, I've identified a few critical issues that should be addressed:

Security: The docker-compose.yml file contains hardcoded database credentials. These should be externalized using environment variables to avoid committing secrets to the repository.
Correctness: There's a bug in the provider.go file where the defer statement for metric recording doesn't work as intended due to variable shadowing and incorrect return patterns. This leads to inaccurate metrics for external calls, which undermines the goal of this PR.
Robustness: The new verification_steps.sh script uses a fixed sleep duration, which can be unreliable. I've suggested a more robust polling mechanism.

I've provided detailed comments and code suggestions for these points. Once they are addressed, this will be a solid contribution.

exchange/orchestration-engine/provider/provider.go

exchange/docker-compose.yml

observability/verification_steps.sh

ginaxu1 · 2026-01-05T07:15:20Z

/gemini review

gemini-code-assist

Code Review

This pull request integrates observability features into the orchestration-engine and policy-decision-point services, and enhances the local development environment for observability. Key changes include adding a pdp-db PostgreSQL service to docker-compose.yml and configuring both services to connect to it, along with updating go.mod and go.sum files to include a new shared/monitoring dependency. Both services now initialize the monitoring library, expose a /metrics endpoint, and apply an HTTP metrics middleware; the orchestration-engine also records external provider call durations and GraphQL request outcomes. Observability configurations for Grafana and Prometheus were updated, including changes to metric labels and Prometheus scrape targets to use host.docker.internal. A new verification_steps.sh script was added to automate the setup and verification of the observability stack and services. Review comments highlighted security concerns regarding weak default database passwords ('password' and empty string) in docker-compose.yml, suggesting their removal or more secure defaults, and identified a redundant sleep 5 command in the new verification script due to existing depends_on and health check mechanisms.

exchange/docker-compose.yml

observability/verification_steps.sh

sthanikan2000

@ginaxu1 When I go through the implementation, I can see that the way you add the observability to the PDP and OE is not same. Is it intentional?

ginaxu1 · 2026-01-06T06:50:17Z

updated: OE, PDP, CE all follow the same pattern:

if monitoring.IsObservabilityEnabled() {
    monitoringConfig := monitoring.DefaultConfig("service-name")
    if err := monitoring.Initialize(monitoringConfig); err != nil {
        slog.Warn("Failed to initialize monitoring (service will continue)", "error", err)
    }
} else {
    slog.Info("Observability disabled via environment variable")
}

gemini-code-assist

Code Review

This PR introduces observability to the OE and PDP services using OpenTelemetry, which is a great addition. The implementation of the shared monitoring module is solid, with good practices like support for multiple exporters and graceful degradation. The new verification_steps.sh script is also a fantastic improvement for developer experience.

I've left a few comments with suggestions for improvement:

A high-severity issue in prometheus.yml regarding the use of host.docker.internal, which will likely break the setup for container-based development.
A few medium-severity suggestions in docker-compose.yml to improve the robustness and clarity of the local development setup for databases.
A minor suggestion to use slog.Warn instead of slog.Error for non-fatal monitoring initialization failures.

Overall, this is a well-executed feature. Addressing these points will make the observability setup more robust and easier to use for all developers.

observability/prometheus/prometheus.yml

exchange/docker-compose.yml

exchange/policy-decision-point/main.go

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-implemented observability solution for the PDP, OE, and CE services. The changes are robust, making observability optional via environment variables, and the inclusion of a verification script is a fantastic addition for improving the developer experience. The instrumentation for HTTP metrics, business events, and external calls is thorough. I've identified a couple of medium-severity issues: one concerning an inconsistency in the database setup for the orchestration-engine in Docker Compose, and another related to Grafana dashboard provisioning which could lead to duplicates. Addressing these will enhance the consistency and maintainability of the local development environment.

observability/grafana/dashboards/go-services-metrics.json

gemini-code-assist

Code Review

This pull request introduces a comprehensive observability solution for the PDP and OE services, leveraging OpenTelemetry for metrics collection. The changes are well-structured, using a shared monitoring package and allowing observability to be toggled via environment variables. The addition of Prometheus/Grafana configurations and a verification script for local development is a great touch. My review focuses on a minor logical issue in the status reporting of the monitoring package, but overall the implementation is solid.

exchange/shared/monitoring/metrics.go

ginaxu1 mentioned this pull request Jan 3, 2026

317 Connect Orchestration Engine and Policy Decision Point to Observability Stack #380

Closed

19 tasks

ginaxu1 marked this pull request as draft January 3, 2026 14:26

gemini-code-assist bot reviewed Jan 3, 2026

View reviewed changes

ginaxu1 force-pushed the 317-part3-observability-only branch from 233c951 to 208988a Compare January 5, 2026 05:07

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

observability/grafana/dashboards/go-services-metrics.json Outdated Show resolved Hide resolved

exchange/orchestration-engine/main.go Outdated Show resolved Hide resolved

exchange/orchestration-engine/server/server.go Outdated Show resolved Hide resolved

ginaxu1 force-pushed the 317-part3-observability-only branch from 3539a0c to cd9e5e1 Compare January 5, 2026 05:22

ginaxu1 changed the title ~~317 part3 observability only~~ 317 Integration Observability with PDP and OE Jan 5, 2026

ginaxu1 changed the title ~~317 Integration Observability with PDP and OE~~ 317 Add Observability to PDP and OE Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

exchange/docker-compose.yml Outdated Show resolved Hide resolved

exchange/docker-compose.yml Outdated Show resolved Hide resolved

exchange/orchestration-engine/provider/provider.go Outdated Show resolved Hide resolved

exchange/policy-decision-point/main.go Outdated Show resolved Hide resolved

ginaxu1 force-pushed the 317-part3-observability-only branch from 7e73a36 to 7a1d12c Compare January 5, 2026 06:31

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

exchange/orchestration-engine/provider/provider.go Show resolved Hide resolved

exchange/docker-compose.yml Outdated Show resolved Hide resolved

exchange/docker-compose.yml Outdated Show resolved Hide resolved

observability/verification_steps.sh Outdated Show resolved Hide resolved

ginaxu1 mentioned this pull request Jan 5, 2026

Add Observability: Metrics for the Project #317

Open

3 tasks

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

exchange/docker-compose.yml Outdated Show resolved Hide resolved

exchange/docker-compose.yml Outdated Show resolved Hide resolved

observability/verification_steps.sh Outdated Show resolved Hide resolved

ginaxu1 marked this pull request as ready for review January 5, 2026 07:20

ginaxu1 requested review from mushrafmim and sthanikan2000 January 5, 2026 07:20

ginaxu1 force-pushed the 317-part3-observability-only branch 2 times, most recently from aad2ff8 to af8a49d Compare January 5, 2026 07:31

sthanikan2000 reviewed Jan 5, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

observability/prometheus/prometheus.yml Outdated Show resolved Hide resolved

exchange/docker-compose.yml Outdated Show resolved Hide resolved

exchange/docker-compose.yml Outdated Show resolved Hide resolved

exchange/policy-decision-point/main.go Outdated Show resolved Hide resolved

ginaxu1 force-pushed the 317-part3-observability-only branch from af8a49d to ce39049 Compare January 6, 2026 07:25

ginaxu1 requested a review from sthanikan2000 January 6, 2026 07:28

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

observability/grafana/dashboards/go-services-metrics.json Show resolved Hide resolved

ginaxu1 force-pushed the 317-part3-observability-only branch from ce39049 to ef523df Compare January 6, 2026 07:31

ginaxu1 mentioned this pull request Jan 7, 2026

#340 Part 3: Complete Audit Flow integration and create Audit Interface #413

Merged

ginaxu1 force-pushed the 317-part3-observability-only branch 2 times, most recently from 2736899 to c331f5b Compare January 13, 2026 05:06

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

exchange/shared/monitoring/metrics.go Show resolved Hide resolved

OpenDIF deleted a comment from gemini-code-assist bot Jan 14, 2026

ginaxu1 force-pushed the 317-part3-observability-only branch 2 times, most recently from eb9a9d9 to 71b3bb7 Compare January 18, 2026 10:56

Add observability to OE and PDP

8f3117f

ginaxu1 force-pushed the 317-part3-observability-only branch from 71b3bb7 to 8f3117f Compare January 23, 2026 08:30

317 Add Observability to PDP and OE #408

Are you sure you want to change the base?

317 Add Observability to PDP and OE #408

Uh oh!

Conversation

ginaxu1 commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Metrics Captured

Type of Change

Testing with Observability Enabled (Default)

Start Services

Verify Metrics

Verify Health

Testing with Observability Disabled

Start Services with Observability Disabled

Verify Metrics Disabled

Verify Services Still Work

Verify Logs

Troubleshooting

Service Fails to Start

Metrics Endpoint Returns 503

Services Work But No Metrics

Deployment

Environment Variables

Verification

Rollback Plan

Uh oh!

gemini-code-assist bot commented Jan 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ginaxu1 commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ginaxu1 commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ginaxu1 commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ginaxu1 commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sthanikan2000 left a comment

ginaxu1 commented Jan 3, 2026 •

edited

Loading