feat(database): add stop/start lifecycle operations#142
feat(database): add stop/start lifecycle operations#142
Conversation
- Add StopDatabase/StartDatabase to DatabaseService interface - Implement StopDatabase: stops container, sidecars, sets STOPPED status - Implement StartDatabase: starts container, waits for ready, starts sidecars - Add Stop/Start HTTP handlers (POST /databases/:id/stop, /start) - Register new routes in router - Update metrics counters on stop/start
📝 WalkthroughWalkthroughThis PR adds database stop/start lifecycle management, introducing two new API endpoints ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler as DatabaseHandler
participant Service as DatabaseService
participant Repo as DatabaseRepo
participant RBAC
participant Compute as ComputeService
participant Events as EventService
participant Audit as AuditService
Client->>Handler: POST /databases/:id/stop
Handler->>Service: StopDatabase(ctx, id)
Service->>RBAC: Enforce PermissionDBUpdate
RBAC-->>Service: OK
Service->>Repo: GetByID(id)
Repo-->>Service: database
Service->>Service: Validate RUNNING & !REPLICA
Service->>Compute: StopInstance(exporter)
Compute-->>Service: OK (or warn on failure)
Service->>Compute: StopInstance(pooler)
Compute-->>Service: OK (or warn on failure)
Service->>Compute: StopInstance(main DB)
Compute-->>Service: OK
Service->>Repo: Update(db.Status=STOPPED)
Repo-->>Service: OK
Service->>Events: RecordEvent("DATABASE_STOP")
Events-->>Service: OK
Service->>Audit: Log("database.stop")
Audit-->>Service: OK
Service-->>Handler: nil
Handler-->>Client: 200 {"message": "database stopped"}
sequenceDiagram
participant Client
participant Handler as DatabaseHandler
participant Service as DatabaseService
participant Repo as DatabaseRepo
participant RBAC
participant Compute as ComputeService
participant WaitPoller as ReadinessPoller
participant Events as EventService
participant Audit as AuditService
Client->>Handler: POST /databases/:id/start
Handler->>Service: StartDatabase(ctx, id)
Service->>RBAC: Enforce PermissionDBUpdate
RBAC-->>Service: OK
Service->>Repo: GetByID(id)
Repo-->>Service: database
Service->>Service: Validate STOPPED & ContainerID set
Service->>Compute: StartInstance(main DB)
Compute-->>Service: OK
Service->>WaitPoller: waitForDatabaseReady (poll up to 30x)
WaitPoller->>Compute: GetInstanceIP(container)
Compute-->>WaitPoller: IP address
WaitPoller-->>Service: OK
Service->>Compute: StartInstance(pooler) [if PoolingEnabled]
Compute-->>Service: OK (or warn on failure)
Service->>Repo: Update(db.Status=RUNNING)
Repo-->>Service: OK
Service->>Events: RecordEvent("DATABASE_START")
Events-->>Service: OK
Service->>Audit: Log("database.start")
Audit-->>Service: OK
Service-->>Handler: nil
Handler-->>Client: 200 {"message": "database started"}
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Go Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.
| Benchmark suite | Current: bbe3543 | Previous: 99aa6d9 | Ratio |
|---|---|---|---|
BenchmarkFunctionServiceInvoke |
53638 ns/op 1720 B/op 23 allocs/op |
26747 ns/op 1720 B/op 23 allocs/op |
2.01 |
BenchmarkFunctionServiceInvoke - ns/op |
53638 ns/op |
26747 ns/op |
2.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Pull request overview
Adds managed database lifecycle operations (stop/start) to the API and implements the underlying service logic, with accompanying handler tests and documentation updates.
Changes:
- Add
POST /databases/:id/stopandPOST /databases/:id/startroutes and handler methods. - Implement
DatabaseService.StopDatabase/StartDatabasewith container lifecycle management and status transitions. - Document new endpoints and update changelog.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/handlers/database_handler.go | Adds Stop/Start HTTP handlers that call the database service and return success messages. |
| internal/handlers/database_handler_test.go | Adds unit tests for Stop/Start handler success/invalid-id/service-error paths. |
| internal/core/services/database.go | Implements StopDatabase/StartDatabase and a readiness wait helper. |
| internal/core/ports/database.go | Extends the DatabaseService interface with StopDatabase/StartDatabase. |
| internal/api/setup/router.go | Registers the new stop/start database routes with RBAC middleware. |
| docs/database.md | Documents stop/start lifecycle behavior, constraints, and workflow. |
| docs/api-reference.md | Adds API reference entries for the new endpoints. |
| CHANGELOG.md | Notes the new stop/start lifecycle feature under “Added”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/core/services/database.go
Outdated
| // Stop database container | ||
| if db.ContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil { | ||
| s.logger.Warn("failed to stop database container", "container_id", db.ContainerID, "error", err) |
There was a problem hiding this comment.
StopDatabase logs and ignores failures from compute.StopInstance (including for the primary db.ContainerID) and still updates the DB status/metrics to STOPPED. This can leave the system reporting STOPPED while the container is still running (or the stop was canceled). Consider treating a failure to stop the main database container as a hard error (and only transitioning status/metrics after a successful stop), while keeping sidecar stops best-effort if desired.
| // Stop database container | |
| if db.ContainerID != "" { | |
| if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil { | |
| s.logger.Warn("failed to stop database container", "container_id", db.ContainerID, "error", err) | |
| // Stop database container; this must succeed before marking the database as stopped. | |
| if db.ContainerID != "" { | |
| if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil { | |
| s.logger.Warn("failed to stop database container", "container_id", db.ContainerID, "error", err) | |
| return err |
internal/core/services/database.go
Outdated
| // Start database container | ||
| if db.ContainerID != "" { | ||
| if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to start database container", err) | ||
| } |
There was a problem hiding this comment.
StartDatabase proceeds even when db.ContainerID is empty (it won’t start anything, yet later it can still set status to RUNNING). This should likely be treated as an inconsistent DB record: either reconstruct the container name (similar to InstanceService) or return an Internal error instead of continuing.
| // Start database container | |
| if db.ContainerID != "" { | |
| if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil { | |
| return errors.Wrap(errors.Internal, "failed to start database container", err) | |
| } | |
| if db.ContainerID == "" { | |
| return errors.New(errors.Internal, "database container ID is missing") | |
| } | |
| // Start database container | |
| if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil { | |
| return errors.Wrap(errors.Internal, "failed to start database container", err) |
internal/core/services/database.go
Outdated
| // Wait for database to be ready | ||
| if err := s.waitForDatabaseReady(ctx, db); err != nil { | ||
| s.logger.Warn("database may not be fully ready", "id", db.ID, "error", err) | ||
| } | ||
|
|
||
| // Start sidecars if enabled | ||
| if db.ExporterContainerID != "" { | ||
| if err := s.compute.StartInstance(ctx, db.ExporterContainerID); err != nil { | ||
| s.logger.Warn("failed to start exporter container", "container_id", db.ExporterContainerID, "error", err) | ||
| } | ||
| } | ||
| if db.PoolingEnabled && db.PoolerContainerID != "" { | ||
| if err := s.compute.StartInstance(ctx, db.PoolerContainerID); err != nil { | ||
| s.logger.Warn("failed to start pooler container", "container_id", db.PoolerContainerID, "error", err) | ||
| } | ||
| } | ||
|
|
||
| db.Status = domain.DatabaseStatusRunning | ||
| if err := s.repo.Update(ctx, db); err != nil { |
There was a problem hiding this comment.
StartDatabase ignores waitForDatabaseReady errors (warns and continues) and then sets status to RUNNING. If readiness times out or GetInstanceIP fails, the API can report RUNNING for an unreachable database. Consider returning an error (or keeping status STOPPED) when readiness fails, or introducing a transitional/ERROR state rather than unconditionally marking RUNNING.
| for i := 0; i < 30; i++ { | ||
| dbIP, _ := s.compute.GetInstanceIP(ctx, db.ContainerID) | ||
| if dbIP != "" { | ||
| return nil // Got IP, assume ready | ||
| } | ||
| time.Sleep(1 * time.Second) | ||
| } |
There was a problem hiding this comment.
waitForDatabaseReady uses time.Sleep in a loop and ignores ctx.Done(). If the request is canceled, this can keep the goroutine alive for up to 30s and also treats context cancellation from GetInstanceIP as a transient error. Consider using a time.Ticker with select { case <-ctx.Done(): ... } and propagating/handling GetInstanceIP errors appropriately.
| for i := 0; i < 30; i++ { | |
| dbIP, _ := s.compute.GetInstanceIP(ctx, db.ContainerID) | |
| if dbIP != "" { | |
| return nil // Got IP, assume ready | |
| } | |
| time.Sleep(1 * time.Second) | |
| } | |
| ticker := time.NewTicker(1 * time.Second) | |
| defer ticker.Stop() | |
| for i := 0; i < 30; i++ { | |
| dbIP, err := s.compute.GetInstanceIP(ctx, db.ContainerID) | |
| if err != nil { | |
| if ctx.Err() != nil { | |
| return ctx.Err() | |
| } | |
| } else if dbIP != "" { | |
| return nil // Got IP, assume ready | |
| } | |
| if i == 29 { | |
| break | |
| } | |
| select { | |
| case <-ctx.Done(): | |
| return ctx.Err() | |
| case <-ticker.C: | |
| } | |
| } |
docs/database.md
Outdated
| 2. **Container Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via Docker. | ||
| 3. **Status Update**: Database status transitions to `STOPPED`. | ||
| 4. **Data Persistence**: The underlying volume is retained — no data is lost. | ||
|
|
||
| #### Start Operation | ||
| The `POST /databases/:id/start` endpoint restarts a stopped database: | ||
| 1. **Validation**: Database must be in `STOPPED` state. | ||
| 2. **Container Start**: The database container is started with the existing volume. | ||
| 3. **Readiness Wait**: The service polls for the instance IP to confirm the database is reachable. |
There was a problem hiding this comment.
This section states sidecars are stopped/started “via Docker” and that readiness polling “confirms the database is reachable”. The implementation calls the abstract ComputeBackend (not necessarily Docker) and readiness currently only checks for a non-empty instance IP (not connectivity). Consider rewording to match the actual behavior/backend abstraction to avoid misleading operators/users.
| 2. **Container Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via Docker. | |
| 3. **Status Update**: Database status transitions to `STOPPED`. | |
| 4. **Data Persistence**: The underlying volume is retained — no data is lost. | |
| #### Start Operation | |
| The `POST /databases/:id/start` endpoint restarts a stopped database: | |
| 1. **Validation**: Database must be in `STOPPED` state. | |
| 2. **Container Start**: The database container is started with the existing volume. | |
| 3. **Readiness Wait**: The service polls for the instance IP to confirm the database is reachable. | |
| 2. **Compute Stop**: The database instance and any sidecars (PgBouncer, exporter) are stopped via the configured compute backend. | |
| 3. **Status Update**: Database status transitions to `STOPPED`. | |
| 4. **Data Persistence**: The underlying volume is retained — no data is lost. | |
| #### Start Operation | |
| The `POST /databases/:id/start` endpoint restarts a stopped database: | |
| 1. **Validation**: Database must be in `STOPPED` state. | |
| 2. **Compute Start**: The database instance is started with the existing volume. | |
| 3. **Readiness Wait**: The service polls until the instance has a non-empty IP address assigned. |
| func (s *DatabaseService) StopDatabase(ctx context.Context, id uuid.UUID) error { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| tenantID := appcontext.TenantIDFromContext(ctx) | ||
| if err := s.rbacSvc.Authorize(ctx, userID, tenantID, domain.PermissionDBUpdate, id.String()); err != nil { | ||
| return err |
There was a problem hiding this comment.
StopDatabase/StartDatabase add substantial new service behavior (compute lifecycle calls, status/metrics transitions, readiness polling) but there are no corresponding DatabaseService unit tests for these paths. Since this package already has a database service unit test suite, adding tests for success + key failure cases (invalid state, compute errors, readiness timeout, repo update failure) would help prevent regressions.
- Add stop/start endpoints to api-reference.md - Document lifecycle in database.md (flows, use cases, constraints) - Add entry to CHANGELOG.md - Add TestDatabaseHandlerStop and TestDatabaseHandlerStart with mock service methods
bbe3543 to
0afc05d
Compare
- StopDatabase: treat failure to stop main container as hard error
- StartDatabase: check for missing container ID; propagate readiness
timeout errors instead of continuing
- waitForDatabaseReady: use time.Ticker with select{ctx.Done} instead
of time.Sleep loop; respect context cancellation
- docs: clarify compute backend vs Docker, readiness check behavior
- unit tests: add StopDatabase/StartDatabase success + failure cases
(invalid state, compute errors, readiness timeout, repo update failure)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Stop database container; must succeed before marking as stopped. | ||
| if db.ContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to stop database container", err) | ||
| } | ||
| } | ||
|
|
||
| db.Status = domain.DatabaseStatusStopped | ||
| if err := s.repo.Update(ctx, db); err != nil { | ||
| return err |
There was a problem hiding this comment.
StopDatabase can mark the DB as STOPPED even when db.ContainerID is empty (because the StopInstance call is skipped). This can leave persisted state inconsistent with the actual running container and will also block future StartDatabase calls (which require ContainerID). Consider treating a missing ContainerID as an error (similar to StartDatabase) before updating status.
| // Stop sidecars first | ||
| if db.ExporterContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.ExporterContainerID); err != nil { | ||
| s.logger.Warn("failed to stop exporter container", "container_id", db.ExporterContainerID, "error", err) | ||
| } | ||
| } | ||
| if db.PoolerContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.PoolerContainerID); err != nil { | ||
| s.logger.Warn("failed to stop pooler container", "container_id", db.PoolerContainerID, "error", err) | ||
| } | ||
| } | ||
|
|
||
| // Stop database container; must succeed before marking as stopped. | ||
| if db.ContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to stop database container", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
StopDatabase stops sidecars before attempting to stop the primary database container. If stopping the main container fails, the method returns an error but sidecars may already be stopped, leaving the database still running but without pooler/exporter (partial side effects on failure). Consider stopping the main container first, or adding rollback/restart logic for sidecars when the main stop fails.
| t.Run("StartDatabase_ReadinessTimeout", func(t *testing.T) { | ||
| dbID := uuid.New() | ||
| db := &domain.Database{ | ||
| ID: dbID, | ||
| UserID: userID, | ||
| Status: domain.DatabaseStatusStopped, | ||
| Role: domain.RolePrimary, | ||
| ContainerID: "cid-timeout", | ||
| } | ||
| mockRepo.On("GetByID", mock.Anything, dbID).Return(db, nil).Once() | ||
| mockCompute.On("StartInstance", mock.Anything, "cid-timeout").Return(nil).Once() | ||
| mockCompute.On("GetInstanceIP", mock.Anything, "cid-timeout").Return("", nil).Once() | ||
| mockCompute.On("GetInstanceIP", mock.Anything, "cid-timeout").Return("", nil).Maybe() | ||
| mockRepo.On("Update", mock.Anything, mock.Anything).Return(nil).Maybe() | ||
| mockEventSvc.On("RecordEvent", mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(nil).Maybe() | ||
| mockAuditSvc.On("Log", mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(nil).Maybe() | ||
|
|
||
| err := svc.StartDatabase(ctx, dbID) | ||
| require.Error(t, err) | ||
| assert.Contains(t, err.Error(), "did not become ready") | ||
| }) |
There was a problem hiding this comment.
StartDatabase_ReadinessTimeout will take ~30 seconds because waitForDatabaseReady polls for up to 30 iterations with a 1s ticker and the test uses context.Background(). To keep unit tests fast and deterministic, consider using a context with a short timeout/deadline for this test (and assert on the resulting wrapped error), or make the readiness wait configurable for tests.
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
internal/handlers/database_handler_test.go (1)
624-690: Refactor stop/start endpoint tests into table-driven cases.The new tests are clear, but this segment duplicates the same structure across scenarios and endpoints. Converting to table-driven cases would align with repository test conventions and reduce maintenance overhead.
As per coding guidelines "Use table-driven tests in test files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/database_handler_test.go` around lines 624 - 690, Refactor the duplicated tests in TestDatabaseHandlerStop and TestDatabaseHandlerStart into table-driven tests: create a table of cases (e.g., name, endpoint "/databases/:id/stop" or "/databases/:id/start", mockSetup function, requestPath, expectedCode, expectedBodyContains) and iterate over them, invoking r.POST with handler.Stop or handler.Start as appropriate; within each case call svc.On("StopDatabase" or "StartDatabase", mock.Anything, id).Return(...) inside mockSetup to configure Success, InvalidID, and ServiceError scenarios (use the same uuid id for valid cases and "invalid-uuid" for InvalidID), perform the HTTP request and assert expected status and body; consolidate duplicated setup logic from setupDatabaseHandlerTest, httptest.NewRecorder, and r.ServeHTTP into the loop to remove duplicated blocks in TestDatabaseHandlerStop and TestDatabaseHandlerStart.internal/core/services/database_unit_test.go (1)
502-651: Table-drive these lifecycle cases with fresh mocks per case.The repeated
t.Runblocks make the timeout path fall back to.Maybe()expectations, so this suite would still pass ifUpdate,RecordEvent, orLogwere called on a failure path. A table-driven loop that creates a new service/mocks per case would keep the coverage and let you assert the negative paths cleanly.As per coding guidelines, "Use table-driven tests in test files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database_unit_test.go` around lines 502 - 651, The tests for StopDatabase and StartDatabase reuse mocks across multiple t.Run blocks which lets expectations fall back to .Maybe() and masks negative-path calls; refactor these into a table-driven test that iterates cases (e.g., cases for "Success", "NotRunning", "Replica", "ComputeError", "MissingContainerID", "ReadinessTimeout") and for each case create fresh mocks and a fresh service instance before calling svc.StopDatabase or svc.StartDatabase, set up per-case mock expectations on mockRepo, mockCompute, mockEventSvc, mockAuditSvc (use mockRepo.On("GetByID", ...), mockCompute.On("StartInstance"/"StopInstance", ...), mockCompute.On("GetInstanceIP", ...), mockRepo.On("Update", ...), mockEventSvc.On("RecordEvent", ...), mockAuditSvc.On("Log", ...)), then assert the expected error/no-error and that only the expected methods were called (no .Maybe() usages that allow stray calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api-reference.md`:
- Around line 736-739: The POST /databases/:id/start endpoint docs currently
only mention the STOPPED-state constraint but omit the additional precondition
that the database record must have a non-empty container_id; update the API
description for the Start endpoint (POST /databases/:id/start) to include a
bullet or sentence stating that the database must have a valid container_id (not
null/empty) or the start will fail, and ensure this is placed alongside the
existing "Constraints" section so consumers know both requirements (STOPPED
state and presence of container_id).
In `@docs/database.md`:
- Around line 533-534: Update the lifecycle docs to clarify that
stopping/starting sidecars is best-effort: change the "Compute Stop" and
corresponding "Compute Start" wording that currently implies guaranteed sidecar
stop/start to explicitly state sidecars (e.g., PgBouncer, exporter) are handled
on a best-effort basis and that only failures to stop/start or determine
readiness of the main database container will produce a hard error and block the
operation; keep the "Status Update" description (transition to `STOPPED`) but
note that sidecar state may not be reflected if they fail to stop.
In `@internal/core/services/database.go`:
- Around line 584-585: The calls to s.eventSvc.RecordEvent and s.auditSvc.Log
are currently discarding errors using `_ =`, so capture their returned errors
(e.g., err := s.eventSvc.RecordEvent(ctx, ...); if err != nil { ... }) and
either log them via the service/struct logger (e.g., s.logger.Errorf/Warningf)
or propagate the error up the lifecycle method if these operations are part of
the contract; do the same for the analogous calls (s.eventSvc.RecordEvent /
s.auditSvc.Log) elsewhere in the file so failures are not silently ignored.
- Around line 560-576: The current logic stops sidecars (db.ExporterContainerID,
db.PoolerContainerID) before the main DB (db.ContainerID), which can leave the
DB running without its sidecars if s.compute.StopInstance(ctx, db.ContainerID)
fails; change the order to stop the main container first (call
s.compute.StopInstance for db.ContainerID before stopping db.ExporterContainerID
and db.PoolerContainerID), or if you prefer to keep the existing order, add
error recovery: if stopping the main container fails, restart the stopped
sidecars by calling the appropriate s.compute.StartInstance (or equivalent) for
db.ExporterContainerID and db.PoolerContainerID in the error path so the system
isn’t left in a degraded state.
- Around line 646-668: The waitForDatabaseReady function currently uses
compute.GetInstanceIP to infer readiness; replace that with an engine-specific
health probe (e.g., for Postgres/MySQL attempt a TCP dial or use the DB driver
to open a connection and Ping) inside DatabaseService.waitForDatabaseReady so
you actually verify the engine accepts connections rather than just having an
IP; make the retry cadence and timeout configurable via DatabaseService fields
or context-aware options (instead of the hard-coded 1s ticker and 30 attempts),
keep honoring ctx.Done(), preserve the same return path on success and return a
clear Internal error if the probe never succeeds; update references to
domain.Database and db.ContainerID to locate the instance and use compute to
obtain IP only if needed for the probe.
- Around line 612-619: After starting the container with
s.compute.StartInstance(ctx, db.ContainerID), if s.waitForDatabaseReady(ctx, db)
returns an error you must rollback the running instance to keep state
consistent: call the appropriate teardown method on s.compute (e.g.,
StopInstance or DestroyInstance) with ctx and db.ContainerID, handle/ignore
errors from that teardown so the original readiness error is returned, and still
wrap the original error with errors.Wrap(errors.Internal, "database failed to
become ready", err) to preserve context; do this in the same block that checks
waitForDatabaseReady so the container is stopped whenever readiness fails.
In `@internal/handlers/database_handler.go`:
- Around line 146-174: The Stop and Start handler methods (DatabaseHandler.Stop
and DatabaseHandler.Start) lack Swagger documentation comments; add Swagger
annotation blocks immediately above each method containing `@Summary` (brief
description), `@Tags` (e.g., Databases), `@Security` (e.g., ApiKeyAuth), `@Param` for
the path "id" (type: string, required), `@Success` responses (e.g., 200 with a
message schema), and `@Router` directives (e.g., /databases/{id}/stop [post] and
/databases/{id}/start [post]) so the operations are included in generated docs.
---
Nitpick comments:
In `@internal/core/services/database_unit_test.go`:
- Around line 502-651: The tests for StopDatabase and StartDatabase reuse mocks
across multiple t.Run blocks which lets expectations fall back to .Maybe() and
masks negative-path calls; refactor these into a table-driven test that iterates
cases (e.g., cases for "Success", "NotRunning", "Replica", "ComputeError",
"MissingContainerID", "ReadinessTimeout") and for each case create fresh mocks
and a fresh service instance before calling svc.StopDatabase or
svc.StartDatabase, set up per-case mock expectations on mockRepo, mockCompute,
mockEventSvc, mockAuditSvc (use mockRepo.On("GetByID", ...),
mockCompute.On("StartInstance"/"StopInstance", ...),
mockCompute.On("GetInstanceIP", ...), mockRepo.On("Update", ...),
mockEventSvc.On("RecordEvent", ...), mockAuditSvc.On("Log", ...)), then assert
the expected error/no-error and that only the expected methods were called (no
.Maybe() usages that allow stray calls).
In `@internal/handlers/database_handler_test.go`:
- Around line 624-690: Refactor the duplicated tests in TestDatabaseHandlerStop
and TestDatabaseHandlerStart into table-driven tests: create a table of cases
(e.g., name, endpoint "/databases/:id/stop" or "/databases/:id/start", mockSetup
function, requestPath, expectedCode, expectedBodyContains) and iterate over
them, invoking r.POST with handler.Stop or handler.Start as appropriate; within
each case call svc.On("StopDatabase" or "StartDatabase", mock.Anything,
id).Return(...) inside mockSetup to configure Success, InvalidID, and
ServiceError scenarios (use the same uuid id for valid cases and "invalid-uuid"
for InvalidID), perform the HTTP request and assert expected status and body;
consolidate duplicated setup logic from setupDatabaseHandlerTest,
httptest.NewRecorder, and r.ServeHTTP into the loop to remove duplicated blocks
in TestDatabaseHandlerStop and TestDatabaseHandlerStart.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54b547f3-b63c-4f56-a3fb-6a85e8ea6b14
📒 Files selected for processing (10)
CHANGELOG.mddocs/api-reference.mddocs/database.mdinternal/api/setup/router.gointernal/core/ports/database.gointernal/core/services/database.gointernal/core/services/database_unit_test.gointernal/handlers/database_handler.gointernal/handlers/database_handler_test.gointernal/workers/database_failover_worker_test.go
| ### POST /databases/:id/start | ||
| Start a stopped database instance. | ||
| - **Constraints**: Database must be in STOPPED state. | ||
| - **Workflow**: Starts container, waits for readiness, starts sidecars if enabled. |
There was a problem hiding this comment.
Document the container ID precondition for start.
Line 738 currently states only STOPPED-state validation. The service also fails start when the database container_id is missing, so this constraint should be documented here for API consumers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/api-reference.md` around lines 736 - 739, The POST /databases/:id/start
endpoint docs currently only mention the STOPPED-state constraint but omit the
additional precondition that the database record must have a non-empty
container_id; update the API description for the Start endpoint (POST
/databases/:id/start) to include a bullet or sentence stating that the database
must have a valid container_id (not null/empty) or the start will fail, and
ensure this is placed alongside the existing "Constraints" section so consumers
know both requirements (STOPPED state and presence of container_id).
| 2. **Compute Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via the configured compute backend. If the main container fails to stop, the operation returns an error without updating status. | ||
| 3. **Status Update**: Database status transitions to `STOPPED`. |
There was a problem hiding this comment.
Clarify sidecar handling as best-effort in lifecycle docs.
Line 533 and Line 542 read as guaranteed sidecar stop/start, but service behavior is best-effort for sidecars and only hard-fails on main container stop/readiness failure. Please align wording with implementation to avoid operational confusion.
📝 Suggested wording update
-2. **Compute Stop**: The database container and all sidecars (PgBouncer, exporter) are stopped via the configured compute backend. If the main container fails to stop, the operation returns an error without updating status.
+2. **Compute Stop**: Sidecars (PgBouncer/exporter), if present, are stopped on a best-effort basis, then the main database container is stopped. If the main container fails to stop, the operation returns an error without updating status.
-4. **Sidecar Start**: PgBouncer and/or metrics exporter are started if enabled.
+4. **Sidecar Start**: PgBouncer and/or metrics exporter are started if enabled (best-effort; sidecar startup failures are logged and do not block status transition).Also applies to: 542-543
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/database.md` around lines 533 - 534, Update the lifecycle docs to
clarify that stopping/starting sidecars is best-effort: change the "Compute
Stop" and corresponding "Compute Start" wording that currently implies
guaranteed sidecar stop/start to explicitly state sidecars (e.g., PgBouncer,
exporter) are handled on a best-effort basis and that only failures to
stop/start or determine readiness of the main database container will produce a
hard error and block the operation; keep the "Status Update" description
(transition to `STOPPED`) but note that sidecar state may not be reflected if
they fail to stop.
| // Stop sidecars first | ||
| if db.ExporterContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.ExporterContainerID); err != nil { | ||
| s.logger.Warn("failed to stop exporter container", "container_id", db.ExporterContainerID, "error", err) | ||
| } | ||
| } | ||
| if db.PoolerContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.PoolerContainerID); err != nil { | ||
| s.logger.Warn("failed to stop pooler container", "container_id", db.PoolerContainerID, "error", err) | ||
| } | ||
| } | ||
|
|
||
| // Stop database container; must succeed before marking as stopped. | ||
| if db.ContainerID != "" { | ||
| if err := s.compute.StopInstance(ctx, db.ContainerID); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to stop database container", err) | ||
| } |
There was a problem hiding this comment.
Avoid stopping sidecars before the main container.
If stopping db.ContainerID fails, this returns an error after already stopping the pooler/exporter. That leaves the database still running but its auxiliary access/metrics path down. Stop the main container first, or restart the sidecars on the error path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 560 - 576, The current logic
stops sidecars (db.ExporterContainerID, db.PoolerContainerID) before the main DB
(db.ContainerID), which can leave the DB running without its sidecars if
s.compute.StopInstance(ctx, db.ContainerID) fails; change the order to stop the
main container first (call s.compute.StopInstance for db.ContainerID before
stopping db.ExporterContainerID and db.PoolerContainerID), or if you prefer to
keep the existing order, add error recovery: if stopping the main container
fails, restart the stopped sidecars by calling the appropriate
s.compute.StartInstance (or equivalent) for db.ExporterContainerID and
db.PoolerContainerID in the error path so the system isn’t left in a degraded
state.
| _ = s.eventSvc.RecordEvent(ctx, "DATABASE_STOP", db.ID.String(), "DATABASE", nil) | ||
| _ = s.auditSvc.Log(ctx, db.UserID, "database.stop", "database", db.ID.String(), map[string]interface{}{"name": db.Name}) |
There was a problem hiding this comment.
Don't silently drop event/audit failures.
These new lifecycle paths discard both errors with _ =, so event or audit outages become invisible. At minimum log them explicitly; if they're part of the contract, propagate the error instead.
As per coding guidelines, "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()".
Also applies to: 638-639
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 584 - 585, The calls to
s.eventSvc.RecordEvent and s.auditSvc.Log are currently discarding errors using
`_ =`, so capture their returned errors (e.g., err :=
s.eventSvc.RecordEvent(ctx, ...); if err != nil { ... }) and either log them via
the service/struct logger (e.g., s.logger.Errorf/Warningf) or propagate the
error up the lifecycle method if these operations are part of the contract; do
the same for the analogous calls (s.eventSvc.RecordEvent / s.auditSvc.Log)
elsewhere in the file so failures are not silently ignored.
| if err := s.compute.StartInstance(ctx, db.ContainerID); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to start database container", err) | ||
| } | ||
|
|
||
| // Wait for database to be ready; return error if it fails. | ||
| if err := s.waitForDatabaseReady(ctx, db); err != nil { | ||
| return errors.Wrap(errors.Internal, "database failed to become ready", err) | ||
| } |
There was a problem hiding this comment.
Rollback the container when readiness fails.
Once StartInstance succeeds, any waitForDatabaseReady error exits without reverting the container state. That can leave the instance running while the record stays STOPPED, which makes later lifecycle calls inconsistent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 612 - 619, After starting
the container with s.compute.StartInstance(ctx, db.ContainerID), if
s.waitForDatabaseReady(ctx, db) returns an error you must rollback the running
instance to keep state consistent: call the appropriate teardown method on
s.compute (e.g., StopInstance or DestroyInstance) with ctx and db.ContainerID,
handle/ignore errors from that teardown so the original readiness error is
returned, and still wrap the original error with errors.Wrap(errors.Internal,
"database failed to become ready", err) to preserve context; do this in the same
block that checks waitForDatabaseReady so the container is stopped whenever
readiness fails.
| func (s *DatabaseService) waitForDatabaseReady(ctx context.Context, db *domain.Database) error { | ||
| ticker := time.NewTicker(1 * time.Second) | ||
| defer ticker.Stop() | ||
|
|
||
| for i := 0; i < 30; i++ { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| case <-ticker.C: | ||
| } | ||
|
|
||
| dbIP, err := s.compute.GetInstanceIP(ctx, db.ContainerID) | ||
| if err != nil { | ||
| if ctx.Err() != nil { | ||
| return ctx.Err() | ||
| } | ||
| continue | ||
| } | ||
| if dbIP != "" { | ||
| return nil | ||
| } | ||
| } | ||
| return errors.New(errors.Internal, "database did not become ready in time") |
There was a problem hiding this comment.
GetInstanceIP is not a readiness check.
A non-empty IP only shows the container has networking; it does not prove Postgres/MySQL is accepting connections. This can mark the database RUNNING and start sidecars before the engine is actually usable. Prefer an engine-specific health probe and make the retry cadence configurable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 646 - 668, The
waitForDatabaseReady function currently uses compute.GetInstanceIP to infer
readiness; replace that with an engine-specific health probe (e.g., for
Postgres/MySQL attempt a TCP dial or use the DB driver to open a connection and
Ping) inside DatabaseService.waitForDatabaseReady so you actually verify the
engine accepts connections rather than just having an IP; make the retry cadence
and timeout configurable via DatabaseService fields or context-aware options
(instead of the hard-coded 1s ticker and 30 attempts), keep honoring ctx.Done(),
preserve the same return path on success and return a clear Internal error if
the probe never succeeds; update references to domain.Database and
db.ContainerID to locate the instance and use compute to obtain IP only if
needed for the probe.
| func (h *DatabaseHandler) Stop(c *gin.Context) { | ||
| id, err := uuid.Parse(c.Param("id")) | ||
| if err != nil { | ||
| httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg)) | ||
| return | ||
| } | ||
|
|
||
| if err := h.svc.StopDatabase(c.Request.Context(), id); err != nil { | ||
| httputil.Error(c, err) | ||
| return | ||
| } | ||
|
|
||
| httputil.Success(c, http.StatusOK, gin.H{"message": "database stopped"}) | ||
| } | ||
|
|
||
| func (h *DatabaseHandler) Start(c *gin.Context) { | ||
| id, err := uuid.Parse(c.Param("id")) | ||
| if err != nil { | ||
| httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg)) | ||
| return | ||
| } | ||
|
|
||
| if err := h.svc.StartDatabase(c.Request.Context(), id); err != nil { | ||
| httputil.Error(c, err) | ||
| return | ||
| } | ||
|
|
||
| httputil.Success(c, http.StatusOK, gin.H{"message": "database started"}) | ||
| } |
There was a problem hiding this comment.
Add Swagger docs for the new handler endpoints.
Line 146 and Line 161 introduce public API handlers without Swagger annotations, so generated docs may omit these operations.
📚 Suggested annotation block
+// Stop stops a managed database instance.
+// `@Summary` Stop database
+// `@Description` Stops a running managed database instance
+// `@Tags` databases
+// `@Produce` json
+// `@Security` APIKeyAuth
+// `@Param` id path string true "Database ID"
+// `@Success` 200 {object} httputil.Response
+// `@Router` /databases/{id}/stop [post]
func (h *DatabaseHandler) Stop(c *gin.Context) {
@@
}
+// Start starts a managed database instance.
+// `@Summary` Start database
+// `@Description` Starts a stopped managed database instance
+// `@Tags` databases
+// `@Produce` json
+// `@Security` APIKeyAuth
+// `@Param` id path string true "Database ID"
+// `@Success` 200 {object} httputil.Response
+// `@Router` /databases/{id}/start [post]
func (h *DatabaseHandler) Start(c *gin.Context) {
@@
}As per coding guidelines "Include Swagger documentation comments on handler methods with @Summary, @Tags, @Security, and @Router directives".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (h *DatabaseHandler) Stop(c *gin.Context) { | |
| id, err := uuid.Parse(c.Param("id")) | |
| if err != nil { | |
| httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg)) | |
| return | |
| } | |
| if err := h.svc.StopDatabase(c.Request.Context(), id); err != nil { | |
| httputil.Error(c, err) | |
| return | |
| } | |
| httputil.Success(c, http.StatusOK, gin.H{"message": "database stopped"}) | |
| } | |
| func (h *DatabaseHandler) Start(c *gin.Context) { | |
| id, err := uuid.Parse(c.Param("id")) | |
| if err != nil { | |
| httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg)) | |
| return | |
| } | |
| if err := h.svc.StartDatabase(c.Request.Context(), id); err != nil { | |
| httputil.Error(c, err) | |
| return | |
| } | |
| httputil.Success(c, http.StatusOK, gin.H{"message": "database started"}) | |
| } | |
| // Stop stops a managed database instance. | |
| // `@Summary` Stop database | |
| // `@Description` Stops a running managed database instance | |
| // `@Tags` databases | |
| // `@Produce` json | |
| // `@Security` APIKeyAuth | |
| // `@Param` id path string true "Database ID" | |
| // `@Success` 200 {object} httputil.Response | |
| // `@Router` /databases/{id}/stop [post] | |
| func (h *DatabaseHandler) Stop(c *gin.Context) { | |
| id, err := uuid.Parse(c.Param("id")) | |
| if err != nil { | |
| httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg)) | |
| return | |
| } | |
| if err := h.svc.StopDatabase(c.Request.Context(), id); err != nil { | |
| httputil.Error(c, err) | |
| return | |
| } | |
| httputil.Success(c, http.StatusOK, gin.H{"message": "database stopped"}) | |
| } | |
| // Start starts a managed database instance. | |
| // `@Summary` Start database | |
| // `@Description` Starts a stopped managed database instance | |
| // `@Tags` databases | |
| // `@Produce` json | |
| // `@Security` APIKeyAuth | |
| // `@Param` id path string true "Database ID" | |
| // `@Success` 200 {object} httputil.Response | |
| // `@Router` /databases/{id}/start [post] | |
| func (h *DatabaseHandler) Start(c *gin.Context) { | |
| id, err := uuid.Parse(c.Param("id")) | |
| if err != nil { | |
| httputil.Error(c, errors.New(errors.InvalidInput, invalidDatabaseIDMsg)) | |
| return | |
| } | |
| if err := h.svc.StartDatabase(c.Request.Context(), id); err != nil { | |
| httputil.Error(c, err) | |
| return | |
| } | |
| httputil.Success(c, http.StatusOK, gin.H{"message": "database started"}) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/handlers/database_handler.go` around lines 146 - 174, The Stop and
Start handler methods (DatabaseHandler.Stop and DatabaseHandler.Start) lack
Swagger documentation comments; add Swagger annotation blocks immediately above
each method containing `@Summary` (brief description), `@Tags` (e.g., Databases),
`@Security` (e.g., ApiKeyAuth), `@Param` for the path "id" (type: string, required),
`@Success` responses (e.g., 200 with a message schema), and `@Router` directives
(e.g., /databases/{id}/stop [post] and /databases/{id}/start [post]) so the
operations are included in generated docs.
Summary
POST /databases/:id/stopandPOST /databases/:id/startendpoints for managed database lifecycleTest plan
Checklist
Summary by CodeRabbit