ShieldCraft-AI
diff --git a/‎.github/workflows/no-generated-tracked.yml‎
Lines changed: 29 additions & 0 deletions b/‎.github/workflows/no-generated-tracked.yml‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎.github/workflows/spec-pipeline.yml‎
Lines changed: 15 additions & 0 deletions b/‎.github/workflows/spec-pipeline.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/CHECKLIST_OUTCOME_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions b/‎docs/CHECKLIST_OUTCOME_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/governance/AUTHORITY_CEILING_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions b/‎docs/governance/AUTHORITY_CEILING_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/governance/COMPILATION_PHASE_MODEL.md‎
Lines changed: 34 additions & 0 deletions b/‎docs/governance/COMPILATION_PHASE_MODEL.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎docs/governance/COMPILER_FAILURE_NORMALIZATION.md‎
Lines changed: 20 additions & 0 deletions b/‎docs/governance/COMPILER_FAILURE_NORMALIZATION.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md‎
Lines changed: 36 additions & 0 deletions b/‎docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎docs/governance/INVARIANTS.md‎
Lines changed: 90 additions & 0 deletions b/‎docs/governance/INVARIANTS.md‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎docs/governance/OVER_SPEC_TOLERANCE_CONTRACT.md‎
Lines changed: 16 additions & 0 deletions b/‎docs/governance/OVER_SPEC_TOLERANCE_CONTRACT.md‎
Lines changed: 16 additions & 0 deletions
@@ -0,0 +1,29 @@
+name: Guard - no tracked generated files
+
+on:
+  pull_request:
+  push:
+    branches: [ main, develop ]
+
+jobs:
+  check-generated-not-tracked:
+    name: Ensure src/generated is not tracked
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Check for tracked generated files
+        run: |
+          set -euo pipefail
+          echo "Checking tracked files under src/generated/..."
+          count=$(git ls-files src/generated || true | wc -l)
+          if [ "${count// /}" != "0" ]; then
+            echo "ERROR: Detected tracked files under src/generated. This path must not contain committed build outputs."
+            echo "Found files:" 
+            git ls-files src/generated || true
+            exit 1
+          fi
+          echo "No tracked generated files found."
@@ -23,7 +23,22 @@ jobs:
         python -m pip install --upgrade pip
         python -m pip install -e .
         python -m pip install pytest
+
+    - name: Regenerate deterministic codegen outputs (dry-run)
+      env:
+        SHIELDCRAFT_SELFBUILD_ALLOW_DIRTY: '1'
+        SHIELDCRAFT_PERSONA_ENABLED: '0'
+      run: |
+        mkdir -p .selfhost_outputs
+        python -m src.shieldcraft.engine \
+          --self-host \
+          --spec spec/se_dsl_v1.spec.json \
+          --dry-run \
+          --emit-preview .selfhost_outputs/selfhost_preview.json
     
+    - name: Clean stale bytecode
+      run: python scripts/ci/clean_pycache.py
+
     - name: Run tests
       run: pytest -q
       continue-on-error: false
 
@@ -42,6 +42,7 @@ htmlcov/
 .selfhost_outputs/
 products/*/
 evidence/
+src/generated/**
 
 # OS
 .DS_Store
 
@@ -0,0 +1,15 @@
+# CHECKLIST OUTCOME CONTRACT (AUTHORITATIVE)
+
+This document is AUTHORITATIVE: the single source of truth for deriving checklist run outcomes.
+
+Contract
+--------
+- The primary outcome of a checklist run MUST be derived exclusively by `derive_primary_outcome(checklist, events)`.
+- Authoritative precedence: **REFUSAL** > **BLOCKED** > **ACTION** > **DIAGNOSTIC**.
+- Persona annotations are advisory only and MUST NOT override the derived primary outcome.
+- The function MUST return: `primary_outcome`, `refusal` (bool), `blocking_reasons` (list), and `confidence_level` (one of `high|medium|low`).
+- No component may infer or mutate the `primary_outcome` outside of this contract; any attempts are recorded as diagnostic events.
+
+Rationale
+---------
+Centralizing outcome derivation improves auditability and prevents duplicated heuristics appearing in multiple modules. The canonical function is deterministic and tests enforce idempotence and precedence.
@@ -0,0 +1,15 @@
+# AUTHORITY CEILING CONTRACT (AUTHORITATIVE)
+
+This contract defines the authority ceilings for the compiler and enforces guard-only behavior: the compiler must never escalate authority beyond what the spec explicitly grants unless that escalation is exposed as a BLOCKER + DIAGNOSTIC and recorded in the run events.
+
+Principles
+- No new inference, synthesis, or heuristic behavior is permitted by this phase. Phase 12 is purely guard-and-enforcement.
+- Tier A authority must not be silently resolved by the compiler. Any Tier A synthesis must be accompanied by an explicit BLOCKER event and a DIAGNOSTIC event for auditability.
+- REFUSAL outcomes require explicit authority metadata (evidence.refusal.authority) and the compiler asserts its presence at finalization. REFUSAL must not be used to mask missing authority.
+
+Enforcement
+- Assertions are centralized in `finalize_checklist` to fail fast when authority ceilings are violated.
+- Unit tests must verify that missing authority causes assertion failures rather than silent behavior.
+
+Signed: Governance
+Date: 2025-12-17
@@ -0,0 +1,34 @@
+# Compilation Phase Model
+
+The compiler is structured as a fixed sequence of deterministic phases. Each phase has well-defined inputs, outputs, and allowed failure modes.
+
+Fixed phases
+1. Ingestion
+   - Inputs: canonical spec (and optional AST)
+   - Outputs: raw AST traversal items
+   - Allowed failures: schema failures recorded as DIAGNOSTIC events
+2. Normalization
+   - Inputs: raw items
+   - Outputs: enriched items (classification, severity, metadata)
+   - Allowed failures: missing lineage recorded as DIAGNOSTIC
+3. Constraint propagation
+   - Inputs: normalized items and spec constraints
+   - Outputs: additional constraint items, merged constraints
+   - Allowed failures: constraint violations recorded as BLOCKER
+4. Synthesis
+   - Inputs: merged items, derived tasks, invariants results
+   - Outputs: final_items (stable ids, order ranks)
+   - Allowed failures: generation contract failures (BLOCKER)
+5. Finalization
+   - Inputs: final_items
+   - Outputs: serializable result object (`valid`, `items`, `preflight`, `lineage`, `diff`, etc.)
+   - Allowed failures: test gate failure results returned as partial results
+
+Mapping gates (G1–G22) to phases
+- Ingestion: G4_SCHEMA_VALIDATION
+- Normalization: G9_GENERATOR_RUN_FUZZ_GATE, G10_GENERATOR_PREP_MISSING
+- Constraint propagation: G8_TEST_ATTACHMENT_CONTRACT
+- Synthesis: G13_GENERATION_CONTRACT_FAILED, G16_MINIMALITY_INVARIANT_FAILED
+- Finalization: G20_QUALITY_GATE_FAILED, G22_EXECUTE_INTERNAL_ERROR_RETURN
+
+Note: This model is prescriptive: gates are classified by where they must be recorded. The mapping is authoritative and must not be changed without governance approval.
@@ -0,0 +1,20 @@
+# Compiler Failure Normalization (AUTHORITATIVE MATRIX)
+
+This document defines the failure normalization matrix: mapping of Gate → Phase → Event Type → Checklist Item Type.
+
+Matrix (representative examples)
+- G4_SCHEMA_VALIDATION → Ingestion → DIAGNOSTIC → DIAGNOSTIC
+- G9_GENERATOR_RUN_FUZZ_GATE → Normalization → BLOCKER → BLOCKER
+- G10_GENERATOR_PREP_MISSING → Normalization → DIAGNOSTIC → DIAGNOSTIC
+- G11_RUN_TEST_GATE → Synthesis → BLOCKER → BLOCKER
+- G13_GENERATION_CONTRACT_FAILED → Synthesis → BLOCKER → BLOCKER
+- G16_MINIMALITY_INVARIANT_FAILED → Synthesis → REFUSAL → REFUSAL
+- G20_QUALITY_GATE_FAILED → Finalization → REFUSAL → REFUSAL
+- G22_EXECUTE_INTERNAL_ERROR_RETURN → Finalization → DIAGNOSTIC → DIAGNOSTIC
+
+Constraints
+- No new gate IDs may be introduced by this phase.
+- This mapping must be aligned with `finalize_checklist` behavior and the Semantic Outcome Invariants.
+
+Behavioral note
+- A gate may record an event with outcome REFUSAL or BLOCKER; the compiler must ensure such events are surfaced in the final checklist as checklist items and influence primary outcome derivation via `finalize_checklist`.
@@ -0,0 +1,36 @@
+# INFERENCE EXPLAINABILITY CONTRACT (AUTHORITATIVE)
+
+This document defines the authoritative explainability metadata required for any synthesized,
+coerced, inferred, or derived data emitted by the compiler and checklist pipeline.
+
+Mandatory metadata fields (attached to checklist items or synthesized objects):
+- `meta.source` — one of: `explicit | default | derived | coerced | inferred`
+- `meta.justification` — short machine-readable string explaining why the value was created (e.g., `safe_default`, `missing_spec_pointer`, `heuristic_prose_keyword_match`). For BLOCKER/REFUSAL-related inferences the justification MUST reference affected pointer(s) via `meta.justification_ptr` or by embedding the pointer in the justification code.
+- `meta.inference_type` — one of: `none | safe_default | heuristic | structural | fallback`
+- `meta.tier` — when applicable: `A | B | C` (reflects the template tier per `TEMPLATE_COMPILATION_CONTRACT.md`)
+
+Principles
+- Any inference must be recorded in machine-readable fields above; missing explainability
+  metadata is a compiler violation.
+- Violations are classified by tier: Tier A missing explainability → BLOCKER; Tier B → DIAGNOSTIC; Tier C → advisory.
+- Explainability metadata must be deterministic and include a short justification code suitable for audit and filtering.
+
+Examples
+- Synthesized default for missing `agents` (Tier A):
+  - `meta.source = "default"`
+  - `meta.justification = "safe_default_agents_list"`
+  - `meta.inference_type = "safe_default"`
+  - `meta.tier = "A"`
+
+- Prose-derived confidence heuristic:
+  - `meta.source = "inferred"`
+  - `meta.justification = "heuristic_prose_keyword_match"`
+  - `meta.inference_type = "heuristic"`
+  - `meta.tier = "C"` (if informal)
+
+Enforcement
+- The compiler attaches explainability metadata at each inference site; unit tests and CI guards assert the presence.
+- Tier A inferences without a corresponding checklist item or without explainability metadata are considered violations and will be detected via compiler assertions and failing tests.
+
+Signed: Governance
+Date: 2025-12-16
@@ -162,3 +162,93 @@ This file declares the governance anchor; enforcement logic will be implemented
 - `finalize_checklist(...)` is the sole emission boundary.
 - This invariant is enforced by code-level assertions and tests.
 
+## Refusal Authority Invariant (Phase 11C)
+
+- Statement: Every event with outcome `REFUSAL` **must** include structured refusal metadata under `evidence.refusal` containing keys: `authority` (non-empty string), `trigger`, `scope`, and `justification`.
+- Enforcement point: `src/shieldcraft/engine.finalize_checklist` asserts `authority` presence and exposes `checklist.refusal_authority`.
+- Failure classification: Missing or invalid `authority` is treated as a compiler assertion/implementation error and must fail the finalization boundary to avoid ambiguous REFUSALs.
+- Testable requirements:
+  - Unit tests must cover (a) REFUSALs recorded via `record_refusal_event` propagate authority into finalized checklist, and (b) REFUSALs recorded without `authority` cause finalization to raise an `AssertionError` and emit a paired diagnostic entry.
+- Evidence: Contract `docs/governance/REFUSAL_AUTHORITY_CONTRACT.md`, tests (`tests/test_refusal_authority.py`), and CI guard (`tests/ci/test_refusal_authority_persistence.py`).
+
+
+## Semantic Outcome Invariants (Phase 5)
+
+- Statement: Every emitted checklist MUST contain a single canonical `primary_outcome` with value one of: `SUCCESS`, `REFUSAL`, `BLOCKED`, `DIAGNOSTIC_ONLY`.
+- Role assignment: Each checklist item MUST be assigned exactly one `role` drawn from: `PRIMARY_CAUSE`, `CONTRIBUTING_BLOCKER`, `SECONDARY_DIAGNOSTIC`, `INFORMATIONAL`.
+- Mapping rules:
+  - `REFUSAL` if any recorded event has outcome `REFUSAL`.
+  - `BLOCKED` if no `REFUSAL` and any recorded event has outcome `BLOCKER`.
+  - `DIAGNOSTIC_ONLY` if all recorded events are `DIAGNOSTIC`.
+  - `SUCCESS` if there are no events or only informational/non-diagnostic events.
+- Semantic invariants (enforced in `finalize_checklist`):
+  - Exactly one `PRIMARY_CAUSE` item MUST exist unless `primary_outcome == SUCCESS`.
+  - `REFUSAL` outcome MUST include `refusal_reason` and top-level `refusal == true`.
+  - `BLOCKED` outcome MUST NOT set `refusal == true`.
+  - `DIAGNOSTIC_ONLY` outcome MUST NOT contain `BLOCKER` or `REFUSAL` items.
+- Enforcement: These invariants are enforced by code-level assertions inside `finalize_checklist` and protected by deterministic, unit-tested behavior.
+
+- Semantic Outcome Lock: The canonical semantics (primary outcome derivation, item roles, and invariants) are locked under Phase 5 and may not be altered except via an explicit governance phase update. Changes to semantic meaning require a recorded governance decision and a corresponding implementation phase.
+
+## Persona Arbitration Invariant (Phase 6)
+
+- Statement: Persona authority and routing MUST be explicit, deterministic, and auditable. Persona outputs are evidence and may be compressed into a single primary persona cause for audit, but persona outputs MUST NOT arbitrarily change canonical checklist semantics without a governance decision.
+- Rules:
+  - Personas MAY declare an optional `authority` of one of: `DECISIVE`, `ADVISORY`, `ANNOTATIVE` (metadata only in Phase 6).
+  - Routing of persona invocation MUST be static and derived from the explicit routing table in `src/shieldcraft/persona/routing.py` (if configured); otherwise persona discovery falls back to `scope` rules.
+  - Persona events are recorded atomically and deterministically in `artifacts/persona_events_v1.json` and hashed for integrity.
+  - Persona outputs are compressed into a `checklist.persona_summary` structure for deterministic auditability; compression does not change primary checklist outcome semantics.
+- Enforcement: These invariants are enforced by documentation, deterministic routing, persona metadata, persona event compression implemented in `finalize_checklist`, and the consolidated canonical protocol documentation (Phase 7).
+
+## Spec-to-Checklist Compiler Invariant (Phase 8)
+
+- Statement: The Spec → Checklist compilation subsystem (authoritative entrypoint: `ChecklistGenerator.build` in `src/shieldcraft/services/checklist/generator.py`) is an auditable, deterministic, first-class subsystem. It MUST always return a serializable checklist result object (possibly marked invalid), and it MUST record gating events to the `ChecklistContext` so that `finalize_checklist(...)` can derive the canonical outcome.
+
+- Requirements (testable):
+  - Every compiler entrypoint MUST return an emitted result object containing at minimum `items` (no silent non-emission).
+  - No unrecorded raise may escape the compiler boundary such that `finalize_checklist(...)` is not invoked by the caller; engine entrypoints (e.g., `Engine.run`) MUST catch compiler errors, record a diagnostic gate event, and return a finalized checklist artifact.
+  - All recorded gate events emitted during compilation MUST appear in the finalized checklist artifact (as `events` and corresponding checklist items) to ensure auditability.
+
+- Enforcement: Verified by unit tests (regression guards) and documented compiler contracts (`SPEC_TO_CHECKLIST_COMPILER.md`, `SPEC_INPUT_CLASSIFICATION.md`, `COMPILATION_PHASE_MODEL.md`, `COMPILER_FAILURE_NORMALIZATION.md`).
+
+- Lock: This invariant is locked by Phase 8 and may not be changed except via a governance phase update.
+
+---
+
+## Inference Explainability Invariant (Phase 11B/11D/11E)
+
+- Statement: All inferred, synthesized, coerced, or derived values that affect checklist emission or gating MUST include machine-readable explainability metadata attached to the affected object (item/meta/evidence/header). No silent inference is permitted: every non-explicit value must carry provenance and a justification code.
+- Required fields: `meta.source`, `meta.justification`, `meta.inference_type`, and `meta.tier` (when applicable for Tier A/B/C). BLOCKER/REFUSAL-related inferences MUST reference affected pointer(s) either via `meta.justification_ptr` or via an explicit pointer embedded in `meta.justification`.
+- Additional required provenance for specific cases:
+  - Coercions MUST preserve `meta.original_value` for auditability.
+  - Derived tasks MUST include `meta.derived_from = <parent_id>` and `meta.justification` referencing the derivation rule.
+  - Confidence assignments MUST include `confidence_meta` with `source` and `justification` fields.
+  - Unknown invariant expressions that are defaulted MUST attach `explainability` metadata and emit a DIAGNOSTIC checklist item.
+- Enforcement: Compiler unit tests and CI guards detect missing explainability metadata; Tier A omissions are BLOCKERs and Tier B are DIAGNOSTIC. Missing required provenance (e.g., `original_value` for coercions or `derived_from` for derived tasks) fails CI and must be remediated.
+- Rationale: Prevent silent intent invention by requiring that every automatic decision be auditable, machine-filterable, and provenance-bound.
+---
+
+## Compiler Hardening Invariants (Phase 11A)
+
+---
+
+## Inference Explainability Invariant (Phase 11B)
+
+- Statement: All synthesized, inferred, coerced, or derived data emitted by the compiler MUST include explainability metadata according to `docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md`.
+- Testable requirements:
+  - Any checklist item with `meta.synthesized_default == True` MUST have `meta.source`, `meta.justification`, and `meta.inference_type` defined.
+  - Any item whose fields are coerced or normalized by `ChecklistModel.normalize_item` MUST include `meta.source = "coerced"` and `meta.justification`.
+  - Any derived task emitted by `infer_tasks` MUST include `meta.source = "derived"` and `meta.justification`.
+  - Invariant evaluations that return safe defaults for unknown expressions MUST attach `explainability` metadata to the `invariant_results` entries.
+- Enforcement: Verified by unit tests and CI guards; violations are test failures and must be remediated promptly.
+
+- Statement: Compiler hardening measures introduced in Phase 11A (Tier enforcement, default synthesis, insufficiency diagnostics, and checklist quality scoring) are authoritative and must be enforced by the compiler pipeline.
+
+- Testable invariants:
+  - Tier enforcement is implemented in `src/shieldcraft/services/checklist/tier_enforcement.py::enforce_tiers` and must emit checklist items for missing Tier A/B sections (BLOCKER/DIAGNOSTIC respectively).
+  - Default synthesis is implemented in `src/shieldcraft/services/spec/defaults.py::synthesize_missing_spec_fields` and must be invoked exactly once during compiler entry (currently in `ChecklistGenerator.build`).
+  - Spec sufficiency diagnostics are implemented via `src/shieldcraft/services/spec/analysis.py::check_spec_sufficiency` and must produce DIAGNOSTIC checklist items without aborting compilation.
+  - Checklist quality scoring is implemented in `src/shieldcraft/services/checklist/quality.py::compute_checklist_quality` and its result MUST be attached to `checklist.meta.checklist_quality` by `finalize_checklist`.
+
+- Enforcement: Unit tests and CI guards verify these invariants; any regression must be patched and re-locked via governance decision.
+
@@ -0,0 +1,16 @@
+# OVER-SPEC TOLERANCE CONTRACT (AUTHORITATIVE)
+
+Phase 13 guarantees that compiler behavior is stable and non-drifting under over-complete and redundant specifications. This phase is guard-only: it adds tests and validations to detect conflicts and ensure invariance; it does not alter inference, synthesis, or authority ceilings.
+
+Principles
+- Redundant or repeated spec elements must not amplify inference or authority.
+- Extra non-conflicting detail must not change primary outcomes or escalate severities.
+- Conflicting explicit instructions must be surfaced (DIAGNOSTIC/BLOCKER) and not auto-resolved by the compiler.
+- Deterministic behavior (ordering, ids, hashing) must hold at scale.
+
+Enforcement
+- Deterministic unit tests verify redundancy tolerance, over-spec stability, explicit conflict visibility, and scale invariance.
+- Any violation that suggests silent authority escalation or resolution raises an assertion in tests and will be investigated.
+
+Signed: Governance
+Date: 2025-12-17