Skip to content

feat(database): add stop/start lifecycle operations#142

Open
poyrazK wants to merge 4 commits intomainfrom
feat/database-stop-start
Open

feat(database): add stop/start lifecycle operations#142
poyrazK wants to merge 4 commits intomainfrom
feat/database-stop-start

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Apr 13, 2026

Summary

  • Add POST /databases/:id/stop and POST /databases/:id/start endpoints for managed database lifecycle
  • Implement service methods with container start/stop, sidecar management, and status transitions
  • Add handler unit tests for stop/start operations
  • Document new endpoints in api-reference.md and database.md

Test plan

  • Unit tests for handler (Stop/InvalidID/ServiceError, Start/InvalidID/ServiceError)
  • All existing database handler tests pass

Checklist

  • Handler tests added
  • API reference updated
  • Database guide updated
  • Changelog updated

Summary by CodeRabbit

  • New Features
    • Added stop and start endpoints for managed database instances, enabling pause and resume operations while preserving data volumes and storage.
    • Stopped databases can be restarted; certain database configurations cannot be stopped (replicas, databases in transition states).

- Add StopDatabase/StartDatabase to DatabaseService interface
- Implement StopDatabase: stops container, sidecars, sets STOPPED status
- Implement StartDatabase: starts container, waits for ready, starts sidecars
- Add Stop/Start HTTP handlers (POST /databases/:id/stop, /start)
- Register new routes in router
- Update metrics counters on stop/start
Copilot AI review requested due to automatic review settings April 13, 2026 14:15
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

This PR adds database stop/start lifecycle management, introducing two new API endpoints (POST /databases/:id/stop and POST /databases/:id/start) with corresponding service logic to pause/resume managed database instances while preserving data volumes. Changes span documentation, routing, interfaces, service implementation, handlers, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
API Documentation
CHANGELOG.md, docs/api-reference.md, docs/database.md
Added documentation for new stop/start endpoints, including validation rules, status transitions, sidecar handling, readiness polling, and use cases/constraints for managed databases.
API Routing
internal/api/setup/router.go
Registered two new authenticated routes: POST /databases/:id/stop and POST /databases/:id/start, both RBAC-protected with PermissionDBUpdate.
Core Interface
internal/core/ports/database.go
Extended DatabaseService interface with two new method signatures: StopDatabase and StartDatabase.
Service Implementation
internal/core/services/database.go
Implemented stop/start logic including state validation, sidecar orchestration (exporter/pooler), polling-based readiness wait (up to 30 attempts, 1s intervals), error handling with event/audit logging and Prometheus metrics tracking.
HTTP Handler
internal/handlers/database_handler.go
Added Stop and Start handler methods that parse UUID path parameters, invoke service methods, and return JSON responses on success.
Handler Tests
internal/handlers/database_handler_test.go
Added test cases for both handlers covering success, invalid UUID, and service error scenarios; added mock service methods.
Service Tests
internal/core/services/database_unit_test.go
Extended TestDatabaseServiceUnitExtended with subtests for successful stop/start operations and negative cases (invalid state, replica constraint, missing container ID, compute failures, readiness timeout).
Worker Tests
internal/workers/database_failover_worker_test.go
Added mock implementations of StopDatabase and StartDatabase methods for test compatibility.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as DatabaseHandler
    participant Service as DatabaseService
    participant Repo as DatabaseRepo
    participant RBAC
    participant Compute as ComputeService
    participant Events as EventService
    participant Audit as AuditService

    Client->>Handler: POST /databases/:id/stop
    Handler->>Service: StopDatabase(ctx, id)
    Service->>RBAC: Enforce PermissionDBUpdate
    RBAC-->>Service: OK
    Service->>Repo: GetByID(id)
    Repo-->>Service: database
    Service->>Service: Validate RUNNING & !REPLICA
    Service->>Compute: StopInstance(exporter)
    Compute-->>Service: OK (or warn on failure)
    Service->>Compute: StopInstance(pooler)
    Compute-->>Service: OK (or warn on failure)
    Service->>Compute: StopInstance(main DB)
    Compute-->>Service: OK
    Service->>Repo: Update(db.Status=STOPPED)
    Repo-->>Service: OK
    Service->>Events: RecordEvent("DATABASE_STOP")
    Events-->>Service: OK
    Service->>Audit: Log("database.stop")
    Audit-->>Service: OK
    Service-->>Handler: nil
    Handler-->>Client: 200 {"message": "database stopped"}
Loading
sequenceDiagram
    participant Client
    participant Handler as DatabaseHandler
    participant Service as DatabaseService
    participant Repo as DatabaseRepo
    participant RBAC
    participant Compute as ComputeService
    participant WaitPoller as ReadinessPoller
    participant Events as EventService
    participant Audit as AuditService

    Client->>Handler: POST /databases/:id/start
    Handler->>Service: StartDatabase(ctx, id)
    Service->>RBAC: Enforce PermissionDBUpdate
    RBAC-->>Service: OK
    Service->>Repo: GetByID(id)
    Repo-->>Service: database
    Service->>Service: Validate STOPPED & ContainerID set
    Service->>Compute: StartInstance(main DB)
    Compute-->>Service: OK
    Service->>WaitPoller: waitForDatabaseReady (poll up to 30x)
    WaitPoller->>Compute: GetInstanceIP(container)
    Compute-->>WaitPoller: IP address
    WaitPoller-->>Service: OK
    Service->>Compute: StartInstance(pooler) [if PoolingEnabled]
    Compute-->>Service: OK (or warn on failure)
    Service->>Repo: Update(db.Status=RUNNING)
    Repo-->>Service: OK
    Service->>Events: RecordEvent("DATABASE_START")
    Events-->>Service: OK
    Service->>Audit: Log("database.start")
    Audit-->>Service: OK
    Service-->>Handler: nil
    Handler-->>Client: 200 {"message": "database started"}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 Hops of joy for pausing runs,
Database sleeps, no data loss—
Stop and start in morning suns,
Our RDS now takes a toss!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding stop/start lifecycle operations for managed databases, which aligns with all the substantial changes (new endpoints, service methods, handlers, and documentation).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/database-stop-start

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Go Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: bbe3543 Previous: 99aa6d9 Ratio
BenchmarkFunctionServiceInvoke 53638 ns/op 1720 B/op 23 allocs/op 26747 ns/op 1720 B/op 23 allocs/op 2.01
BenchmarkFunctionServiceInvoke - ns/op 53638 ns/op 26747 ns/op 2.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds managed database lifecycle operations (stop/start) to the API and implements the underlying service logic, with accompanying handler tests and documentation updates.

Changes:

  • Add POST /databases/:id/stop and POST /databases/:id/start routes and handler methods.
  • Implement DatabaseService.StopDatabase / StartDatabase with container lifecycle management and status transitions.
  • Document new endpoints and update changelog.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
internal/handlers/database_handler.go Adds Stop/Start HTTP handlers that call the database service and return success messages.
internal/handlers/database_handler_test.go Adds unit tests for Stop/Start handler success/invalid-id/service-error paths.
internal/core/services/database.go Implements StopDatabase/StartDatabase and a readiness wait helper.
internal/core/ports/database.go Extends the DatabaseService interface with StopDatabase/StartDatabase.
internal/api/setup/router.go Registers the new stop/start database routes with RBAC middleware.
docs/database.md Documents stop/start lifecycle behavior, constraints, and workflow.
docs/api-reference.md Adds API reference entries for the new endpoints.
CHANGELOG.md Notes the new stop/start lifecycle feature under “Added”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +572 to +575
// Stop database container
if db.ContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil {
s.logger.Warn("failed to stop database container", "container_id", db.ContainerID, "error", err)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StopDatabase logs and ignores failures from compute.StopInstance (including for the primary db.ContainerID) and still updates the DB status/metrics to STOPPED. This can leave the system reporting STOPPED while the container is still running (or the stop was canceled). Consider treating a failure to stop the main database container as a hard error (and only transitioning status/metrics after a successful stop), while keeping sidecar stops best-effort if desired.

Suggested change
// Stop database container
if db.ContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil {
s.logger.Warn("failed to stop database container", "container_id", db.ContainerID, "error", err)
// Stop database container; this must succeed before marking the database as stopped.
if db.ContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil {
s.logger.Warn("failed to stop database container", "container_id", db.ContainerID, "error", err)
return err

Copilot uses AI. Check for mistakes.
Comment on lines +608 to +612
// Start database container
if db.ContainerID != "" {
if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to start database container", err)
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartDatabase proceeds even when db.ContainerID is empty (it won’t start anything, yet later it can still set status to RUNNING). This should likely be treated as an inconsistent DB record: either reconstruct the container name (similar to InstanceService) or return an Internal error instead of continuing.

Suggested change
// Start database container
if db.ContainerID != "" {
if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to start database container", err)
}
if db.ContainerID == "" {
return errors.New(errors.Internal, "database container ID is missing")
}
// Start database container
if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to start database container", err)

Copilot uses AI. Check for mistakes.
Comment on lines +615 to +633
// Wait for database to be ready
if err := s.waitForDatabaseReady(ctx, db); err != nil {
s.logger.Warn("database may not be fully ready", "id", db.ID, "error", err)
}

// Start sidecars if enabled
if db.ExporterContainerID != "" {
if err := s.compute.StartInstance(ctx, db.ExporterContainerID); err != nil {
s.logger.Warn("failed to start exporter container", "container_id", db.ExporterContainerID, "error", err)
}
}
if db.PoolingEnabled && db.PoolerContainerID != "" {
if err := s.compute.StartInstance(ctx, db.PoolerContainerID); err != nil {
s.logger.Warn("failed to start pooler container", "container_id", db.PoolerContainerID, "error", err)
}
}

db.Status = domain.DatabaseStatusRunning
if err := s.repo.Update(ctx, db); err != nil {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartDatabase ignores waitForDatabaseReady errors (warns and continues) and then sets status to RUNNING. If readiness times out or GetInstanceIP fails, the API can report RUNNING for an unreachable database. Consider returning an error (or keeping status STOPPED) when readiness fails, or introducing a transitional/ERROR state rather than unconditionally marking RUNNING.

Copilot uses AI. Check for mistakes.
Comment on lines +646 to +652
for i := 0; i < 30; i++ {
dbIP, _ := s.compute.GetInstanceIP(ctx, db.ContainerID)
if dbIP != "" {
return nil // Got IP, assume ready
}
time.Sleep(1 * time.Second)
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForDatabaseReady uses time.Sleep in a loop and ignores ctx.Done(). If the request is canceled, this can keep the goroutine alive for up to 30s and also treats context cancellation from GetInstanceIP as a transient error. Consider using a time.Ticker with select { case <-ctx.Done(): ... } and propagating/handling GetInstanceIP errors appropriately.

Suggested change
for i := 0; i < 30; i++ {
dbIP, _ := s.compute.GetInstanceIP(ctx, db.ContainerID)
if dbIP != "" {
return nil // Got IP, assume ready
}
time.Sleep(1 * time.Second)
}
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for i := 0; i < 30; i++ {
dbIP, err := s.compute.GetInstanceIP(ctx, db.ContainerID)
if err != nil {
if ctx.Err() != nil {
return ctx.Err()
}
} else if dbIP != "" {
return nil // Got IP, assume ready
}
if i == 29 {
break
}
select {
case <-ctx.Done():
return ctx.Err()
case <-ticker.C:
}
}

Copilot uses AI. Check for mistakes.
docs/database.md Outdated
Comment on lines +533 to +541
2. **Container Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via Docker.
3. **Status Update**: Database status transitions to `STOPPED`.
4. **Data Persistence**: The underlying volume is retained — no data is lost.

#### Start Operation
The `POST /databases/:id/start` endpoint restarts a stopped database:
1. **Validation**: Database must be in `STOPPED` state.
2. **Container Start**: The database container is started with the existing volume.
3. **Readiness Wait**: The service polls for the instance IP to confirm the database is reachable.
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section states sidecars are stopped/started “via Docker” and that readiness polling “confirms the database is reachable”. The implementation calls the abstract ComputeBackend (not necessarily Docker) and readiness currently only checks for a non-empty instance IP (not connectivity). Consider rewording to match the actual behavior/backend abstraction to avoid misleading operators/users.

Suggested change
2. **Container Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via Docker.
3. **Status Update**: Database status transitions to `STOPPED`.
4. **Data Persistence**: The underlying volume is retained — no data is lost.
#### Start Operation
The `POST /databases/:id/start` endpoint restarts a stopped database:
1. **Validation**: Database must be in `STOPPED` state.
2. **Container Start**: The database container is started with the existing volume.
3. **Readiness Wait**: The service polls for the instance IP to confirm the database is reachable.
2. **Compute Stop**: The database instance and any sidecars (PgBouncer, exporter) are stopped via the configured compute backend.
3. **Status Update**: Database status transitions to `STOPPED`.
4. **Data Persistence**: The underlying volume is retained — no data is lost.
#### Start Operation
The `POST /databases/:id/start` endpoint restarts a stopped database:
1. **Validation**: Database must be in `STOPPED` state.
2. **Compute Start**: The database instance is started with the existing volume.
3. **Readiness Wait**: The service polls until the instance has a non-empty IP address assigned.

Copilot uses AI. Check for mistakes.
Comment on lines +541 to +545
func (s *DatabaseService) StopDatabase(ctx context.Context, id uuid.UUID) error {
userID := appcontext.UserIDFromContext(ctx)
tenantID := appcontext.TenantIDFromContext(ctx)
if err := s.rbacSvc.Authorize(ctx, userID, tenantID, domain.PermissionDBUpdate, id.String()); err != nil {
return err
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StopDatabase/StartDatabase add substantial new service behavior (compute lifecycle calls, status/metrics transitions, readiness polling) but there are no corresponding DatabaseService unit tests for these paths. Since this package already has a database service unit test suite, adding tests for success + key failure cases (invalid state, compute errors, readiness timeout, repo update failure) would help prevent regressions.

Copilot uses AI. Check for mistakes.
- Add stop/start endpoints to api-reference.md
- Document lifecycle in database.md (flows, use cases, constraints)
- Add entry to CHANGELOG.md
- Add TestDatabaseHandlerStop and TestDatabaseHandlerStart with mock service methods
@poyrazK poyrazK force-pushed the feat/database-stop-start branch from bbe3543 to 0afc05d Compare April 13, 2026 14:27
- StopDatabase: treat failure to stop main container as hard error
- StartDatabase: check for missing container ID; propagate readiness
  timeout errors instead of continuing
- waitForDatabaseReady: use time.Ticker with select{ctx.Done} instead
  of time.Sleep loop; respect context cancellation
- docs: clarify compute backend vs Docker, readiness check behavior
- unit tests: add StopDatabase/StartDatabase success + failure cases
  (invalid state, compute errors, readiness timeout, repo update failure)
Copilot AI review requested due to automatic review settings April 13, 2026 16:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +572 to +581
// Stop database container; must succeed before marking as stopped.
if db.ContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to stop database container", err)
}
}

db.Status = domain.DatabaseStatusStopped
if err := s.repo.Update(ctx, db); err != nil {
return err
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StopDatabase can mark the DB as STOPPED even when db.ContainerID is empty (because the StopInstance call is skipped). This can leave persisted state inconsistent with the actual running container and will also block future StartDatabase calls (which require ContainerID). Consider treating a missing ContainerID as an error (similar to StartDatabase) before updating status.

Copilot uses AI. Check for mistakes.
Comment on lines +560 to +577
// Stop sidecars first
if db.ExporterContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ExporterContainerID); err != nil {
s.logger.Warn("failed to stop exporter container", "container_id", db.ExporterContainerID, "error", err)
}
}
if db.PoolerContainerID != "" {
if err := s.compute.StopInstance(ctx, db.PoolerContainerID); err != nil {
s.logger.Warn("failed to stop pooler container", "container_id", db.PoolerContainerID, "error", err)
}
}

// Stop database container; must succeed before marking as stopped.
if db.ContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to stop database container", err)
}
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StopDatabase stops sidecars before attempting to stop the primary database container. If stopping the main container fails, the method returns an error but sidecars may already be stopped, leaving the database still running but without pooler/exporter (partial side effects on failure). Consider stopping the main container first, or adding rollback/restart logic for sidecars when the main stop fails.

Copilot uses AI. Check for mistakes.
Comment on lines +631 to +651
t.Run("StartDatabase_ReadinessTimeout", func(t *testing.T) {
dbID := uuid.New()
db := &domain.Database{
ID: dbID,
UserID: userID,
Status: domain.DatabaseStatusStopped,
Role: domain.RolePrimary,
ContainerID: "cid-timeout",
}
mockRepo.On("GetByID", mock.Anything, dbID).Return(db, nil).Once()
mockCompute.On("StartInstance", mock.Anything, "cid-timeout").Return(nil).Once()
mockCompute.On("GetInstanceIP", mock.Anything, "cid-timeout").Return("", nil).Once()
mockCompute.On("GetInstanceIP", mock.Anything, "cid-timeout").Return("", nil).Maybe()
mockRepo.On("Update", mock.Anything, mock.Anything).Return(nil).Maybe()
mockEventSvc.On("RecordEvent", mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(nil).Maybe()
mockAuditSvc.On("Log", mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(nil).Maybe()

err := svc.StartDatabase(ctx, dbID)
require.Error(t, err)
assert.Contains(t, err.Error(), "did not become ready")
})
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartDatabase_ReadinessTimeout will take ~30 seconds because waitForDatabaseReady polls for up to 30 iterations with a 1s ticker and the test uses context.Background(). To keep unit tests fast and deterministic, consider using a context with a short timeout/deadline for this test (and assert on the resulting wrapped error), or make the readiness wait configurable for tests.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
internal/handlers/database_handler_test.go (1)

624-690: Refactor stop/start endpoint tests into table-driven cases.

The new tests are clear, but this segment duplicates the same structure across scenarios and endpoints. Converting to table-driven cases would align with repository test conventions and reduce maintenance overhead.

As per coding guidelines "Use table-driven tests in test files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/handlers/database_handler_test.go` around lines 624 - 690, Refactor
the duplicated tests in TestDatabaseHandlerStop and TestDatabaseHandlerStart
into table-driven tests: create a table of cases (e.g., name, endpoint
"/databases/:id/stop" or "/databases/:id/start", mockSetup function,
requestPath, expectedCode, expectedBodyContains) and iterate over them, invoking
r.POST with handler.Stop or handler.Start as appropriate; within each case call
svc.On("StopDatabase" or "StartDatabase", mock.Anything, id).Return(...) inside
mockSetup to configure Success, InvalidID, and ServiceError scenarios (use the
same uuid id for valid cases and "invalid-uuid" for InvalidID), perform the HTTP
request and assert expected status and body; consolidate duplicated setup logic
from setupDatabaseHandlerTest, httptest.NewRecorder, and r.ServeHTTP into the
loop to remove duplicated blocks in TestDatabaseHandlerStop and
TestDatabaseHandlerStart.
internal/core/services/database_unit_test.go (1)

502-651: Table-drive these lifecycle cases with fresh mocks per case.

The repeated t.Run blocks make the timeout path fall back to .Maybe() expectations, so this suite would still pass if Update, RecordEvent, or Log were called on a failure path. A table-driven loop that creates a new service/mocks per case would keep the coverage and let you assert the negative paths cleanly.

As per coding guidelines, "Use table-driven tests in test files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database_unit_test.go` around lines 502 - 651, The
tests for StopDatabase and StartDatabase reuse mocks across multiple t.Run
blocks which lets expectations fall back to .Maybe() and masks negative-path
calls; refactor these into a table-driven test that iterates cases (e.g., cases
for "Success", "NotRunning", "Replica", "ComputeError", "MissingContainerID",
"ReadinessTimeout") and for each case create fresh mocks and a fresh service
instance before calling svc.StopDatabase or svc.StartDatabase, set up per-case
mock expectations on mockRepo, mockCompute, mockEventSvc, mockAuditSvc (use
mockRepo.On("GetByID", ...), mockCompute.On("StartInstance"/"StopInstance",
...), mockCompute.On("GetInstanceIP", ...), mockRepo.On("Update", ...),
mockEventSvc.On("RecordEvent", ...), mockAuditSvc.On("Log", ...)), then assert
the expected error/no-error and that only the expected methods were called (no
.Maybe() usages that allow stray calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/api-reference.md`:
- Around line 736-739: The POST /databases/:id/start endpoint docs currently
only mention the STOPPED-state constraint but omit the additional precondition
that the database record must have a non-empty container_id; update the API
description for the Start endpoint (POST /databases/:id/start) to include a
bullet or sentence stating that the database must have a valid container_id (not
null/empty) or the start will fail, and ensure this is placed alongside the
existing "Constraints" section so consumers know both requirements (STOPPED
state and presence of container_id).

In `@docs/database.md`:
- Around line 533-534: Update the lifecycle docs to clarify that
stopping/starting sidecars is best-effort: change the "Compute Stop" and
corresponding "Compute Start" wording that currently implies guaranteed sidecar
stop/start to explicitly state sidecars (e.g., PgBouncer, exporter) are handled
on a best-effort basis and that only failures to stop/start or determine
readiness of the main database container will produce a hard error and block the
operation; keep the "Status Update" description (transition to `STOPPED`) but
note that sidecar state may not be reflected if they fail to stop.

In `@internal/core/services/database.go`:
- Around line 584-585: The calls to s.eventSvc.RecordEvent and s.auditSvc.Log
are currently discarding errors using `_ =`, so capture their returned errors
(e.g., err := s.eventSvc.RecordEvent(ctx, ...); if err != nil { ... }) and
either log them via the service/struct logger (e.g., s.logger.Errorf/Warningf)
or propagate the error up the lifecycle method if these operations are part of
the contract; do the same for the analogous calls (s.eventSvc.RecordEvent /
s.auditSvc.Log) elsewhere in the file so failures are not silently ignored.
- Around line 560-576: The current logic stops sidecars (db.ExporterContainerID,
db.PoolerContainerID) before the main DB (db.ContainerID), which can leave the
DB running without its sidecars if s.compute.StopInstance(ctx, db.ContainerID)
fails; change the order to stop the main container first (call
s.compute.StopInstance for db.ContainerID before stopping db.ExporterContainerID
and db.PoolerContainerID), or if you prefer to keep the existing order, add
error recovery: if stopping the main container fails, restart the stopped
sidecars by calling the appropriate s.compute.StartInstance (or equivalent) for
db.ExporterContainerID and db.PoolerContainerID in the error path so the system
isn’t left in a degraded state.
- Around line 646-668: The waitForDatabaseReady function currently uses
compute.GetInstanceIP to infer readiness; replace that with an engine-specific
health probe (e.g., for Postgres/MySQL attempt a TCP dial or use the DB driver
to open a connection and Ping) inside DatabaseService.waitForDatabaseReady so
you actually verify the engine accepts connections rather than just having an
IP; make the retry cadence and timeout configurable via DatabaseService fields
or context-aware options (instead of the hard-coded 1s ticker and 30 attempts),
keep honoring ctx.Done(), preserve the same return path on success and return a
clear Internal error if the probe never succeeds; update references to
domain.Database and db.ContainerID to locate the instance and use compute to
obtain IP only if needed for the probe.
- Around line 612-619: After starting the container with
s.compute.StartInstance(ctx, db.ContainerID), if s.waitForDatabaseReady(ctx, db)
returns an error you must rollback the running instance to keep state
consistent: call the appropriate teardown method on s.compute (e.g.,
StopInstance or DestroyInstance) with ctx and db.ContainerID, handle/ignore
errors from that teardown so the original readiness error is returned, and still
wrap the original error with errors.Wrap(errors.Internal, "database failed to
become ready", err) to preserve context; do this in the same block that checks
waitForDatabaseReady so the container is stopped whenever readiness fails.

In `@internal/handlers/database_handler.go`:
- Around line 146-174: The Stop and Start handler methods (DatabaseHandler.Stop
and DatabaseHandler.Start) lack Swagger documentation comments; add Swagger
annotation blocks immediately above each method containing `@Summary` (brief
description), `@Tags` (e.g., Databases), `@Security` (e.g., ApiKeyAuth), `@Param` for
the path "id" (type: string, required), `@Success` responses (e.g., 200 with a
message schema), and `@Router` directives (e.g., /databases/{id}/stop [post] and
/databases/{id}/start [post]) so the operations are included in generated docs.

---

Nitpick comments:
In `@internal/core/services/database_unit_test.go`:
- Around line 502-651: The tests for StopDatabase and StartDatabase reuse mocks
across multiple t.Run blocks which lets expectations fall back to .Maybe() and
masks negative-path calls; refactor these into a table-driven test that iterates
cases (e.g., cases for "Success", "NotRunning", "Replica", "ComputeError",
"MissingContainerID", "ReadinessTimeout") and for each case create fresh mocks
and a fresh service instance before calling svc.StopDatabase or
svc.StartDatabase, set up per-case mock expectations on mockRepo, mockCompute,
mockEventSvc, mockAuditSvc (use mockRepo.On("GetByID", ...),
mockCompute.On("StartInstance"/"StopInstance", ...),
mockCompute.On("GetInstanceIP", ...), mockRepo.On("Update", ...),
mockEventSvc.On("RecordEvent", ...), mockAuditSvc.On("Log", ...)), then assert
the expected error/no-error and that only the expected methods were called (no
.Maybe() usages that allow stray calls).

In `@internal/handlers/database_handler_test.go`:
- Around line 624-690: Refactor the duplicated tests in TestDatabaseHandlerStop
and TestDatabaseHandlerStart into table-driven tests: create a table of cases
(e.g., name, endpoint "/databases/:id/stop" or "/databases/:id/start", mockSetup
function, requestPath, expectedCode, expectedBodyContains) and iterate over
them, invoking r.POST with handler.Stop or handler.Start as appropriate; within
each case call svc.On("StopDatabase" or "StartDatabase", mock.Anything,
id).Return(...) inside mockSetup to configure Success, InvalidID, and
ServiceError scenarios (use the same uuid id for valid cases and "invalid-uuid"
for InvalidID), perform the HTTP request and assert expected status and body;
consolidate duplicated setup logic from setupDatabaseHandlerTest,
httptest.NewRecorder, and r.ServeHTTP into the loop to remove duplicated blocks
in TestDatabaseHandlerStop and TestDatabaseHandlerStart.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54b547f3-b63c-4f56-a3fb-6a85e8ea6b14

📥 Commits

Reviewing files that changed from the base of the PR and between 2890279 and ac5b4e9.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • docs/api-reference.md
  • docs/database.md
  • internal/api/setup/router.go
  • internal/core/ports/database.go
  • internal/core/services/database.go
  • internal/core/services/database_unit_test.go
  • internal/handlers/database_handler.go
  • internal/handlers/database_handler_test.go
  • internal/workers/database_failover_worker_test.go

Comment on lines +736 to +739
### POST /databases/:id/start
Start a stopped database instance.
- **Constraints**: Database must be in STOPPED state.
- **Workflow**: Starts container, waits for readiness, starts sidecars if enabled.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the container ID precondition for start.

Line 738 currently states only STOPPED-state validation. The service also fails start when the database container_id is missing, so this constraint should be documented here for API consumers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-reference.md` around lines 736 - 739, The POST /databases/:id/start
endpoint docs currently only mention the STOPPED-state constraint but omit the
additional precondition that the database record must have a non-empty
container_id; update the API description for the Start endpoint (POST
/databases/:id/start) to include a bullet or sentence stating that the database
must have a valid container_id (not null/empty) or the start will fail, and
ensure this is placed alongside the existing "Constraints" section so consumers
know both requirements (STOPPED state and presence of container_id).

Comment on lines +533 to +534
2. **Compute Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via the configured compute backend. If the main container fails to stop, the operation returns an error without updating status.
3. **Status Update**: Database status transitions to `STOPPED`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify sidecar handling as best-effort in lifecycle docs.

Line 533 and Line 542 read as guaranteed sidecar stop/start, but service behavior is best-effort for sidecars and only hard-fails on main container stop/readiness failure. Please align wording with implementation to avoid operational confusion.

📝 Suggested wording update
-2. **Compute Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via the configured compute backend. If the main container fails to stop, the operation returns an error without updating status.
+2. **Compute Stop**: Sidecars (PgBouncer/exporter), if present, are stopped on a best-effort basis, then the main database container is stopped. If the main container fails to stop, the operation returns an error without updating status.

-4. **Sidecar Start**: PgBouncer and/or metrics exporter are started if enabled.
+4. **Sidecar Start**: PgBouncer and/or metrics exporter are started if enabled (best-effort; sidecar startup failures are logged and do not block status transition).

Also applies to: 542-543

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/database.md` around lines 533 - 534, Update the lifecycle docs to
clarify that stopping/starting sidecars is best-effort: change the "Compute
Stop" and corresponding "Compute Start" wording that currently implies
guaranteed sidecar stop/start to explicitly state sidecars (e.g., PgBouncer,
exporter) are handled on a best-effort basis and that only failures to
stop/start or determine readiness of the main database container will produce a
hard error and block the operation; keep the "Status Update" description
(transition to `STOPPED`) but note that sidecar state may not be reflected if
they fail to stop.

Comment on lines +560 to +576
// Stop sidecars first
if db.ExporterContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ExporterContainerID); err != nil {
s.logger.Warn("failed to stop exporter container", "container_id", db.ExporterContainerID, "error", err)
}
}
if db.PoolerContainerID != "" {
if err := s.compute.StopInstance(ctx, db.PoolerContainerID); err != nil {
s.logger.Warn("failed to stop pooler container", "container_id", db.PoolerContainerID, "error", err)
}
}

// Stop database container; must succeed before marking as stopped.
if db.ContainerID != "" {
if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to stop database container", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid stopping sidecars before the main container.

If stopping db.ContainerID fails, this returns an error after already stopping the pooler/exporter. That leaves the database still running but its auxiliary access/metrics path down. Stop the main container first, or restart the sidecars on the error path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 560 - 576, The current logic
stops sidecars (db.ExporterContainerID, db.PoolerContainerID) before the main DB
(db.ContainerID), which can leave the DB running without its sidecars if
s.compute.StopInstance(ctx, db.ContainerID) fails; change the order to stop the
main container first (call s.compute.StopInstance for db.ContainerID before
stopping db.ExporterContainerID and db.PoolerContainerID), or if you prefer to
keep the existing order, add error recovery: if stopping the main container
fails, restart the stopped sidecars by calling the appropriate
s.compute.StartInstance (or equivalent) for db.ExporterContainerID and
db.PoolerContainerID in the error path so the system isn’t left in a degraded
state.

Comment on lines +584 to +585
_ = s.eventSvc.RecordEvent(ctx, "DATABASE_STOP", db.ID.String(), "DATABASE", nil)
_ = s.auditSvc.Log(ctx, db.UserID, "database.stop", "database", db.ID.String(), map[string]interface{}{"name": db.Name})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently drop event/audit failures.

These new lifecycle paths discard both errors with _ =, so event or audit outages become invisible. At minimum log them explicitly; if they're part of the contract, propagate the error instead.

As per coding guidelines, "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()".

Also applies to: 638-639

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 584 - 585, The calls to
s.eventSvc.RecordEvent and s.auditSvc.Log are currently discarding errors using
`_ =`, so capture their returned errors (e.g., err :=
s.eventSvc.RecordEvent(ctx, ...); if err != nil { ... }) and either log them via
the service/struct logger (e.g., s.logger.Errorf/Warningf) or propagate the
error up the lifecycle method if these operations are part of the contract; do
the same for the analogous calls (s.eventSvc.RecordEvent / s.auditSvc.Log)
elsewhere in the file so failures are not silently ignored.

Comment on lines +612 to +619
if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil {
return errors.Wrap(errors.Internal, "failed to start database container", err)
}

// Wait for database to be ready; return error if it fails.
if err := s.waitForDatabaseReady(ctx, db); err != nil {
return errors.Wrap(errors.Internal, "database failed to become ready", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rollback the container when readiness fails.

Once StartInstance succeeds, any waitForDatabaseReady error exits without reverting the container state. That can leave the instance running while the record stays STOPPED, which makes later lifecycle calls inconsistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 612 - 619, After starting
the container with s.compute.StartInstance(ctx, db.ContainerID), if
s.waitForDatabaseReady(ctx, db) returns an error you must rollback the running
instance to keep state consistent: call the appropriate teardown method on
s.compute (e.g., StopInstance or DestroyInstance) with ctx and db.ContainerID,
handle/ignore errors from that teardown so the original readiness error is
returned, and still wrap the original error with errors.Wrap(errors.Internal,
"database failed to become ready", err) to preserve context; do this in the same
block that checks waitForDatabaseReady so the container is stopped whenever
readiness fails.

Comment on lines +646 to +668
func (s *DatabaseService) waitForDatabaseReady(ctx context.Context, db *domain.Database) error {
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()

for i := 0; i < 30; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case <-ticker.C:
}

dbIP, err := s.compute.GetInstanceIP(ctx, db.ContainerID)
if err != nil {
if ctx.Err() != nil {
return ctx.Err()
}
continue
}
if dbIP != "" {
return nil
}
}
return errors.New(errors.Internal, "database did not become ready in time")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GetInstanceIP is not a readiness check.

A non-empty IP only shows the container has networking; it does not prove Postgres/MySQL is accepting connections. This can mark the database RUNNING and start sidecars before the engine is actually usable. Prefer an engine-specific health probe and make the retry cadence configurable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 646 - 668, The
waitForDatabaseReady function currently uses compute.GetInstanceIP to infer
readiness; replace that with an engine-specific health probe (e.g., for
Postgres/MySQL attempt a TCP dial or use the DB driver to open a connection and
Ping) inside DatabaseService.waitForDatabaseReady so you actually verify the
engine accepts connections rather than just having an IP; make the retry cadence
and timeout configurable via DatabaseService fields or context-aware options
(instead of the hard-coded 1s ticker and 30 attempts), keep honoring ctx.Done(),
preserve the same return path on success and return a clear Internal error if
the probe never succeeds; update references to domain.Database and
db.ContainerID to locate the instance and use compute to obtain IP only if
needed for the probe.

Comment on lines +146 to +174
func (h *DatabaseHandler) Stop(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg))
return
}

if err := h.svc.StopDatabase(c.Request.Context(), id); err != nil {
httputil.Error(c, err)
return
}

httputil.Success(c, http.StatusOK, gin.H{"message": "database stopped"})
}

func (h *DatabaseHandler) Start(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg))
return
}

if err := h.svc.StartDatabase(c.Request.Context(), id); err != nil {
httputil.Error(c, err)
return
}

httputil.Success(c, http.StatusOK, gin.H{"message": "database started"})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add Swagger docs for the new handler endpoints.

Line 146 and Line 161 introduce public API handlers without Swagger annotations, so generated docs may omit these operations.

📚 Suggested annotation block
+// Stop stops a managed database instance.
+// `@Summary` Stop database
+// `@Description` Stops a running managed database instance
+// `@Tags` databases
+// `@Produce` json
+// `@Security` APIKeyAuth
+// `@Param` id path string true "Database ID"
+// `@Success` 200 {object} httputil.Response
+// `@Router` /databases/{id}/stop [post]
 func (h *DatabaseHandler) Stop(c *gin.Context) {
@@
 }

+// Start starts a managed database instance.
+// `@Summary` Start database
+// `@Description` Starts a stopped managed database instance
+// `@Tags` databases
+// `@Produce` json
+// `@Security` APIKeyAuth
+// `@Param` id path string true "Database ID"
+// `@Success` 200 {object} httputil.Response
+// `@Router` /databases/{id}/start [post]
 func (h *DatabaseHandler) Start(c *gin.Context) {
@@
 }

As per coding guidelines "Include Swagger documentation comments on handler methods with @Summary, @Tags, @Security, and @Router directives".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (h *DatabaseHandler) Stop(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg))
return
}
if err := h.svc.StopDatabase(c.Request.Context(), id); err != nil {
httputil.Error(c, err)
return
}
httputil.Success(c, http.StatusOK, gin.H{"message": "database stopped"})
}
func (h *DatabaseHandler) Start(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg))
return
}
if err := h.svc.StartDatabase(c.Request.Context(), id); err != nil {
httputil.Error(c, err)
return
}
httputil.Success(c, http.StatusOK, gin.H{"message": "database started"})
}
// Stop stops a managed database instance.
// `@Summary` Stop database
// `@Description` Stops a running managed database instance
// `@Tags` databases
// `@Produce` json
// `@Security` APIKeyAuth
// `@Param` id path string true "Database ID"
// `@Success` 200 {object} httputil.Response
// `@Router` /databases/{id}/stop [post]
func (h *DatabaseHandler) Stop(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg))
return
}
if err := h.svc.StopDatabase(c.Request.Context(), id); err != nil {
httputil.Error(c, err)
return
}
httputil.Success(c, http.StatusOK, gin.H{"message": "database stopped"})
}
// Start starts a managed database instance.
// `@Summary` Start database
// `@Description` Starts a stopped managed database instance
// `@Tags` databases
// `@Produce` json
// `@Security` APIKeyAuth
// `@Param` id path string true "Database ID"
// `@Success` 200 {object} httputil.Response
// `@Router` /databases/{id}/start [post]
func (h *DatabaseHandler) Start(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg))
return
}
if err := h.svc.StartDatabase(c.Request.Context(), id); err != nil {
httputil.Error(c, err)
return
}
httputil.Success(c, http.StatusOK, gin.H{"message": "database started"})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/handlers/database_handler.go` around lines 146 - 174, The Stop and
Start handler methods (DatabaseHandler.Stop and DatabaseHandler.Start) lack
Swagger documentation comments; add Swagger annotation blocks immediately above
each method containing `@Summary` (brief description), `@Tags` (e.g., Databases),
`@Security` (e.g., ApiKeyAuth), `@Param` for the path "id" (type: string, required),
`@Success` responses (e.g., 200 with a message schema), and `@Router` directives
(e.g., /databases/{id}/stop [post] and /databases/{id}/start [post]) so the
operations are included in generated docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants