Skip to content

Commit 3fb6ed9

Browse files
committed
Merge phase4/lock-checklist-invariant: lock checklist & persona protocol phases
2 parents 1ceca9a + 12993b2 commit 3fb6ed9

File tree

1,766 files changed

+4094
-23555
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,766 files changed

+4094
-23555
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Guard - no tracked generated files
2+
3+
on:
4+
pull_request:
5+
push:
6+
branches: [ main, develop ]
7+
8+
jobs:
9+
check-generated-not-tracked:
10+
name: Ensure src/generated is not tracked
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout repository
14+
uses: actions/checkout@v4
15+
with:
16+
fetch-depth: 0
17+
18+
- name: Check for tracked generated files
19+
run: |
20+
set -euo pipefail
21+
echo "Checking tracked files under src/generated/..."
22+
count=$(git ls-files src/generated || true | wc -l)
23+
if [ "${count// /}" != "0" ]; then
24+
echo "ERROR: Detected tracked files under src/generated. This path must not contain committed build outputs."
25+
echo "Found files:"
26+
git ls-files src/generated || true
27+
exit 1
28+
fi
29+
echo "No tracked generated files found."

.github/workflows/spec-pipeline.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,22 @@ jobs:
2323
python -m pip install --upgrade pip
2424
python -m pip install -e .
2525
python -m pip install pytest
26+
27+
- name: Regenerate deterministic codegen outputs (dry-run)
28+
env:
29+
SHIELDCRAFT_SELFBUILD_ALLOW_DIRTY: '1'
30+
SHIELDCRAFT_PERSONA_ENABLED: '0'
31+
run: |
32+
mkdir -p .selfhost_outputs
33+
python -m src.shieldcraft.engine \
34+
--self-host \
35+
--spec spec/se_dsl_v1.spec.json \
36+
--dry-run \
37+
--emit-preview .selfhost_outputs/selfhost_preview.json
2638
39+
- name: Clean stale bytecode
40+
run: python scripts/ci/clean_pycache.py
41+
2742
- name: Run tests
2843
run: pytest -q
2944
continue-on-error: false

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ htmlcov/
4242
.selfhost_outputs/
4343
products/*/
4444
evidence/
45+
src/generated/**
4546

4647
# OS
4748
.DS_Store

docs/CHECKLIST_OUTCOME_CONTRACT.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# CHECKLIST OUTCOME CONTRACT (AUTHORITATIVE)
2+
3+
This document is AUTHORITATIVE: the single source of truth for deriving checklist run outcomes.
4+
5+
Contract
6+
--------
7+
- The primary outcome of a checklist run MUST be derived exclusively by `derive_primary_outcome(checklist, events)`.
8+
- Authoritative precedence: **REFUSAL** > **BLOCKED** > **ACTION** > **DIAGNOSTIC**.
9+
- Persona annotations are advisory only and MUST NOT override the derived primary outcome.
10+
- The function MUST return: `primary_outcome`, `refusal` (bool), `blocking_reasons` (list), and `confidence_level` (one of `high|medium|low`).
11+
- No component may infer or mutate the `primary_outcome` outside of this contract; any attempts are recorded as diagnostic events.
12+
13+
Rationale
14+
---------
15+
Centralizing outcome derivation improves auditability and prevents duplicated heuristics appearing in multiple modules. The canonical function is deterministic and tests enforce idempotence and precedence.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# AUTHORITY CEILING CONTRACT (AUTHORITATIVE)
2+
3+
This contract defines the authority ceilings for the compiler and enforces guard-only behavior: the compiler must never escalate authority beyond what the spec explicitly grants unless that escalation is exposed as a BLOCKER + DIAGNOSTIC and recorded in the run events.
4+
5+
Principles
6+
- No new inference, synthesis, or heuristic behavior is permitted by this phase. Phase 12 is purely guard-and-enforcement.
7+
- Tier A authority must not be silently resolved by the compiler. Any Tier A synthesis must be accompanied by an explicit BLOCKER event and a DIAGNOSTIC event for auditability.
8+
- REFUSAL outcomes require explicit authority metadata (evidence.refusal.authority) and the compiler asserts its presence at finalization. REFUSAL must not be used to mask missing authority.
9+
10+
Enforcement
11+
- Assertions are centralized in `finalize_checklist` to fail fast when authority ceilings are violated.
12+
- Unit tests must verify that missing authority causes assertion failures rather than silent behavior.
13+
14+
Signed: Governance
15+
Date: 2025-12-17
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Compilation Phase Model
2+
3+
The compiler is structured as a fixed sequence of deterministic phases. Each phase has well-defined inputs, outputs, and allowed failure modes.
4+
5+
Fixed phases
6+
1. Ingestion
7+
- Inputs: canonical spec (and optional AST)
8+
- Outputs: raw AST traversal items
9+
- Allowed failures: schema failures recorded as DIAGNOSTIC events
10+
2. Normalization
11+
- Inputs: raw items
12+
- Outputs: enriched items (classification, severity, metadata)
13+
- Allowed failures: missing lineage recorded as DIAGNOSTIC
14+
3. Constraint propagation
15+
- Inputs: normalized items and spec constraints
16+
- Outputs: additional constraint items, merged constraints
17+
- Allowed failures: constraint violations recorded as BLOCKER
18+
4. Synthesis
19+
- Inputs: merged items, derived tasks, invariants results
20+
- Outputs: final_items (stable ids, order ranks)
21+
- Allowed failures: generation contract failures (BLOCKER)
22+
5. Finalization
23+
- Inputs: final_items
24+
- Outputs: serializable result object (`valid`, `items`, `preflight`, `lineage`, `diff`, etc.)
25+
- Allowed failures: test gate failure results returned as partial results
26+
27+
Mapping gates (G1–G22) to phases
28+
- Ingestion: G4_SCHEMA_VALIDATION
29+
- Normalization: G9_GENERATOR_RUN_FUZZ_GATE, G10_GENERATOR_PREP_MISSING
30+
- Constraint propagation: G8_TEST_ATTACHMENT_CONTRACT
31+
- Synthesis: G13_GENERATION_CONTRACT_FAILED, G16_MINIMALITY_INVARIANT_FAILED
32+
- Finalization: G20_QUALITY_GATE_FAILED, G22_EXECUTE_INTERNAL_ERROR_RETURN
33+
34+
Note: This model is prescriptive: gates are classified by where they must be recorded. The mapping is authoritative and must not be changed without governance approval.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Compiler Failure Normalization (AUTHORITATIVE MATRIX)
2+
3+
This document defines the failure normalization matrix: mapping of Gate → Phase → Event Type → Checklist Item Type.
4+
5+
Matrix (representative examples)
6+
- G4_SCHEMA_VALIDATION → Ingestion → DIAGNOSTIC → DIAGNOSTIC
7+
- G9_GENERATOR_RUN_FUZZ_GATE → Normalization → BLOCKER → BLOCKER
8+
- G10_GENERATOR_PREP_MISSING → Normalization → DIAGNOSTIC → DIAGNOSTIC
9+
- G11_RUN_TEST_GATE → Synthesis → BLOCKER → BLOCKER
10+
- G13_GENERATION_CONTRACT_FAILED → Synthesis → BLOCKER → BLOCKER
11+
- G16_MINIMALITY_INVARIANT_FAILED → Synthesis → REFUSAL → REFUSAL
12+
- G20_QUALITY_GATE_FAILED → Finalization → REFUSAL → REFUSAL
13+
- G22_EXECUTE_INTERNAL_ERROR_RETURN → Finalization → DIAGNOSTIC → DIAGNOSTIC
14+
15+
Constraints
16+
- No new gate IDs may be introduced by this phase.
17+
- This mapping must be aligned with `finalize_checklist` behavior and the Semantic Outcome Invariants.
18+
19+
Behavioral note
20+
- A gate may record an event with outcome REFUSAL or BLOCKER; the compiler must ensure such events are surfaced in the final checklist as checklist items and influence primary outcome derivation via `finalize_checklist`.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# INFERENCE EXPLAINABILITY CONTRACT (AUTHORITATIVE)
2+
3+
This document defines the authoritative explainability metadata required for any synthesized,
4+
coerced, inferred, or derived data emitted by the compiler and checklist pipeline.
5+
6+
Mandatory metadata fields (attached to checklist items or synthesized objects):
7+
- `meta.source` — one of: `explicit | default | derived | coerced | inferred`
8+
- `meta.justification` — short machine-readable string explaining why the value was created (e.g., `safe_default`, `missing_spec_pointer`, `heuristic_prose_keyword_match`). For BLOCKER/REFUSAL-related inferences the justification MUST reference affected pointer(s) via `meta.justification_ptr` or by embedding the pointer in the justification code.
9+
- `meta.inference_type` — one of: `none | safe_default | heuristic | structural | fallback`
10+
- `meta.tier` — when applicable: `A | B | C` (reflects the template tier per `TEMPLATE_COMPILATION_CONTRACT.md`)
11+
12+
Principles
13+
- Any inference must be recorded in machine-readable fields above; missing explainability
14+
metadata is a compiler violation.
15+
- Violations are classified by tier: Tier A missing explainability → BLOCKER; Tier B → DIAGNOSTIC; Tier C → advisory.
16+
- Explainability metadata must be deterministic and include a short justification code suitable for audit and filtering.
17+
18+
Examples
19+
- Synthesized default for missing `agents` (Tier A):
20+
- `meta.source = "default"`
21+
- `meta.justification = "safe_default_agents_list"`
22+
- `meta.inference_type = "safe_default"`
23+
- `meta.tier = "A"`
24+
25+
- Prose-derived confidence heuristic:
26+
- `meta.source = "inferred"`
27+
- `meta.justification = "heuristic_prose_keyword_match"`
28+
- `meta.inference_type = "heuristic"`
29+
- `meta.tier = "C"` (if informal)
30+
31+
Enforcement
32+
- The compiler attaches explainability metadata at each inference site; unit tests and CI guards assert the presence.
33+
- Tier A inferences without a corresponding checklist item or without explainability metadata are considered violations and will be detected via compiler assertions and failing tests.
34+
35+
Signed: Governance
36+
Date: 2025-12-16

docs/governance/INVARIANTS.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,3 +162,93 @@ This file declares the governance anchor; enforcement logic will be implemented
162162
- `finalize_checklist(...)` is the sole emission boundary.
163163
- This invariant is enforced by code-level assertions and tests.
164164

165+
## Refusal Authority Invariant (Phase 11C)
166+
167+
- Statement: Every event with outcome `REFUSAL` **must** include structured refusal metadata under `evidence.refusal` containing keys: `authority` (non-empty string), `trigger`, `scope`, and `justification`.
168+
- Enforcement point: `src/shieldcraft/engine.finalize_checklist` asserts `authority` presence and exposes `checklist.refusal_authority`.
169+
- Failure classification: Missing or invalid `authority` is treated as a compiler assertion/implementation error and must fail the finalization boundary to avoid ambiguous REFUSALs.
170+
- Testable requirements:
171+
- Unit tests must cover (a) REFUSALs recorded via `record_refusal_event` propagate authority into finalized checklist, and (b) REFUSALs recorded without `authority` cause finalization to raise an `AssertionError` and emit a paired diagnostic entry.
172+
- Evidence: Contract `docs/governance/REFUSAL_AUTHORITY_CONTRACT.md`, tests (`tests/test_refusal_authority.py`), and CI guard (`tests/ci/test_refusal_authority_persistence.py`).
173+
174+
175+
## Semantic Outcome Invariants (Phase 5)
176+
177+
- Statement: Every emitted checklist MUST contain a single canonical `primary_outcome` with value one of: `SUCCESS`, `REFUSAL`, `BLOCKED`, `DIAGNOSTIC_ONLY`.
178+
- Role assignment: Each checklist item MUST be assigned exactly one `role` drawn from: `PRIMARY_CAUSE`, `CONTRIBUTING_BLOCKER`, `SECONDARY_DIAGNOSTIC`, `INFORMATIONAL`.
179+
- Mapping rules:
180+
- `REFUSAL` if any recorded event has outcome `REFUSAL`.
181+
- `BLOCKED` if no `REFUSAL` and any recorded event has outcome `BLOCKER`.
182+
- `DIAGNOSTIC_ONLY` if all recorded events are `DIAGNOSTIC`.
183+
- `SUCCESS` if there are no events or only informational/non-diagnostic events.
184+
- Semantic invariants (enforced in `finalize_checklist`):
185+
- Exactly one `PRIMARY_CAUSE` item MUST exist unless `primary_outcome == SUCCESS`.
186+
- `REFUSAL` outcome MUST include `refusal_reason` and top-level `refusal == true`.
187+
- `BLOCKED` outcome MUST NOT set `refusal == true`.
188+
- `DIAGNOSTIC_ONLY` outcome MUST NOT contain `BLOCKER` or `REFUSAL` items.
189+
- Enforcement: These invariants are enforced by code-level assertions inside `finalize_checklist` and protected by deterministic, unit-tested behavior.
190+
191+
- Semantic Outcome Lock: The canonical semantics (primary outcome derivation, item roles, and invariants) are locked under Phase 5 and may not be altered except via an explicit governance phase update. Changes to semantic meaning require a recorded governance decision and a corresponding implementation phase.
192+
193+
## Persona Arbitration Invariant (Phase 6)
194+
195+
- Statement: Persona authority and routing MUST be explicit, deterministic, and auditable. Persona outputs are evidence and may be compressed into a single primary persona cause for audit, but persona outputs MUST NOT arbitrarily change canonical checklist semantics without a governance decision.
196+
- Rules:
197+
- Personas MAY declare an optional `authority` of one of: `DECISIVE`, `ADVISORY`, `ANNOTATIVE` (metadata only in Phase 6).
198+
- Routing of persona invocation MUST be static and derived from the explicit routing table in `src/shieldcraft/persona/routing.py` (if configured); otherwise persona discovery falls back to `scope` rules.
199+
- Persona events are recorded atomically and deterministically in `artifacts/persona_events_v1.json` and hashed for integrity.
200+
- Persona outputs are compressed into a `checklist.persona_summary` structure for deterministic auditability; compression does not change primary checklist outcome semantics.
201+
- Enforcement: These invariants are enforced by documentation, deterministic routing, persona metadata, persona event compression implemented in `finalize_checklist`, and the consolidated canonical protocol documentation (Phase 7).
202+
203+
## Spec-to-Checklist Compiler Invariant (Phase 8)
204+
205+
- Statement: The Spec → Checklist compilation subsystem (authoritative entrypoint: `ChecklistGenerator.build` in `src/shieldcraft/services/checklist/generator.py`) is an auditable, deterministic, first-class subsystem. It MUST always return a serializable checklist result object (possibly marked invalid), and it MUST record gating events to the `ChecklistContext` so that `finalize_checklist(...)` can derive the canonical outcome.
206+
207+
- Requirements (testable):
208+
- Every compiler entrypoint MUST return an emitted result object containing at minimum `items` (no silent non-emission).
209+
- No unrecorded raise may escape the compiler boundary such that `finalize_checklist(...)` is not invoked by the caller; engine entrypoints (e.g., `Engine.run`) MUST catch compiler errors, record a diagnostic gate event, and return a finalized checklist artifact.
210+
- All recorded gate events emitted during compilation MUST appear in the finalized checklist artifact (as `events` and corresponding checklist items) to ensure auditability.
211+
212+
- Enforcement: Verified by unit tests (regression guards) and documented compiler contracts (`SPEC_TO_CHECKLIST_COMPILER.md`, `SPEC_INPUT_CLASSIFICATION.md`, `COMPILATION_PHASE_MODEL.md`, `COMPILER_FAILURE_NORMALIZATION.md`).
213+
214+
- Lock: This invariant is locked by Phase 8 and may not be changed except via a governance phase update.
215+
216+
---
217+
218+
## Inference Explainability Invariant (Phase 11B/11D/11E)
219+
220+
- Statement: All inferred, synthesized, coerced, or derived values that affect checklist emission or gating MUST include machine-readable explainability metadata attached to the affected object (item/meta/evidence/header). No silent inference is permitted: every non-explicit value must carry provenance and a justification code.
221+
- Required fields: `meta.source`, `meta.justification`, `meta.inference_type`, and `meta.tier` (when applicable for Tier A/B/C). BLOCKER/REFUSAL-related inferences MUST reference affected pointer(s) either via `meta.justification_ptr` or via an explicit pointer embedded in `meta.justification`.
222+
- Additional required provenance for specific cases:
223+
- Coercions MUST preserve `meta.original_value` for auditability.
224+
- Derived tasks MUST include `meta.derived_from = <parent_id>` and `meta.justification` referencing the derivation rule.
225+
- Confidence assignments MUST include `confidence_meta` with `source` and `justification` fields.
226+
- Unknown invariant expressions that are defaulted MUST attach `explainability` metadata and emit a DIAGNOSTIC checklist item.
227+
- Enforcement: Compiler unit tests and CI guards detect missing explainability metadata; Tier A omissions are BLOCKERs and Tier B are DIAGNOSTIC. Missing required provenance (e.g., `original_value` for coercions or `derived_from` for derived tasks) fails CI and must be remediated.
228+
- Rationale: Prevent silent intent invention by requiring that every automatic decision be auditable, machine-filterable, and provenance-bound.
229+
---
230+
231+
## Compiler Hardening Invariants (Phase 11A)
232+
233+
---
234+
235+
## Inference Explainability Invariant (Phase 11B)
236+
237+
- Statement: All synthesized, inferred, coerced, or derived data emitted by the compiler MUST include explainability metadata according to `docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md`.
238+
- Testable requirements:
239+
- Any checklist item with `meta.synthesized_default == True` MUST have `meta.source`, `meta.justification`, and `meta.inference_type` defined.
240+
- Any item whose fields are coerced or normalized by `ChecklistModel.normalize_item` MUST include `meta.source = "coerced"` and `meta.justification`.
241+
- Any derived task emitted by `infer_tasks` MUST include `meta.source = "derived"` and `meta.justification`.
242+
- Invariant evaluations that return safe defaults for unknown expressions MUST attach `explainability` metadata to the `invariant_results` entries.
243+
- Enforcement: Verified by unit tests and CI guards; violations are test failures and must be remediated promptly.
244+
245+
- Statement: Compiler hardening measures introduced in Phase 11A (Tier enforcement, default synthesis, insufficiency diagnostics, and checklist quality scoring) are authoritative and must be enforced by the compiler pipeline.
246+
247+
- Testable invariants:
248+
- Tier enforcement is implemented in `src/shieldcraft/services/checklist/tier_enforcement.py::enforce_tiers` and must emit checklist items for missing Tier A/B sections (BLOCKER/DIAGNOSTIC respectively).
249+
- Default synthesis is implemented in `src/shieldcraft/services/spec/defaults.py::synthesize_missing_spec_fields` and must be invoked exactly once during compiler entry (currently in `ChecklistGenerator.build`).
250+
- Spec sufficiency diagnostics are implemented via `src/shieldcraft/services/spec/analysis.py::check_spec_sufficiency` and must produce DIAGNOSTIC checklist items without aborting compilation.
251+
- Checklist quality scoring is implemented in `src/shieldcraft/services/checklist/quality.py::compute_checklist_quality` and its result MUST be attached to `checklist.meta.checklist_quality` by `finalize_checklist`.
252+
253+
- Enforcement: Unit tests and CI guards verify these invariants; any regression must be patched and re-locked via governance decision.
254+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# OVER-SPEC TOLERANCE CONTRACT (AUTHORITATIVE)
2+
3+
Phase 13 guarantees that compiler behavior is stable and non-drifting under over-complete and redundant specifications. This phase is guard-only: it adds tests and validations to detect conflicts and ensure invariance; it does not alter inference, synthesis, or authority ceilings.
4+
5+
Principles
6+
- Redundant or repeated spec elements must not amplify inference or authority.
7+
- Extra non-conflicting detail must not change primary outcomes or escalate severities.
8+
- Conflicting explicit instructions must be surfaced (DIAGNOSTIC/BLOCKER) and not auto-resolved by the compiler.
9+
- Deterministic behavior (ordering, ids, hashing) must hold at scale.
10+
11+
Enforcement
12+
- Deterministic unit tests verify redundancy tolerance, over-spec stability, explicit conflict visibility, and scale invariance.
13+
- Any violation that suggests silent authority escalation or resolution raises an assertion in tests and will be investigated.
14+
15+
Signed: Governance
16+
Date: 2025-12-17

0 commit comments

Comments
 (0)