ShieldCraft-AI
diff --git a/‎docs/CHECKLIST_OUTCOME_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions b/‎docs/CHECKLIST_OUTCOME_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/governance/AUTHORITY_CEILING_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions b/‎docs/governance/AUTHORITY_CEILING_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md‎
Lines changed: 36 additions & 0 deletions b/‎docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎docs/governance/INVARIANTS.md‎
Lines changed: 49 additions & 0 deletions b/‎docs/governance/INVARIANTS.md‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎docs/governance/OVER_SPEC_TOLERANCE_CONTRACT.md‎
Lines changed: 16 additions & 0 deletions b/‎docs/governance/OVER_SPEC_TOLERANCE_CONTRACT.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎docs/governance/PERSONA_NON_AUTHORITY_CONTRACT.md‎
Lines changed: 25 additions & 0 deletions b/‎docs/governance/PERSONA_NON_AUTHORITY_CONTRACT.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎docs/governance/REFUSAL_AUTHORITY_CONTRACT.md‎
Lines changed: 40 additions & 0 deletions b/‎docs/governance/REFUSAL_AUTHORITY_CONTRACT.md‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎docs/governance/TEMPLATE_NON_AUTHORITY_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions b/‎docs/governance/TEMPLATE_NON_AUTHORITY_CONTRACT.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/governance/TEST_COLLECTION_AUDIT.md‎
Lines changed: 43 additions & 0 deletions b/‎docs/governance/TEST_COLLECTION_AUDIT.md‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎docs/governance/TEST_COLLECTION_STABILITY_CONTRACT.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/governance/TEST_COLLECTION_STABILITY_CONTRACT.md‎
Lines changed: 21 additions & 0 deletions
@@ -0,0 +1,15 @@
+# CHECKLIST OUTCOME CONTRACT (AUTHORITATIVE)
+
+This document is AUTHORITATIVE: the single source of truth for deriving checklist run outcomes.
+
+Contract
+--------
+- The primary outcome of a checklist run MUST be derived exclusively by `derive_primary_outcome(checklist, events)`.
+- Authoritative precedence: **REFUSAL** > **BLOCKED** > **ACTION** > **DIAGNOSTIC**.
+- Persona annotations are advisory only and MUST NOT override the derived primary outcome.
+- The function MUST return: `primary_outcome`, `refusal` (bool), `blocking_reasons` (list), and `confidence_level` (one of `high|medium|low`).
+- No component may infer or mutate the `primary_outcome` outside of this contract; any attempts are recorded as diagnostic events.
+
+Rationale
+---------
+Centralizing outcome derivation improves auditability and prevents duplicated heuristics appearing in multiple modules. The canonical function is deterministic and tests enforce idempotence and precedence.
@@ -0,0 +1,15 @@
+# AUTHORITY CEILING CONTRACT (AUTHORITATIVE)
+
+This contract defines the authority ceilings for the compiler and enforces guard-only behavior: the compiler must never escalate authority beyond what the spec explicitly grants unless that escalation is exposed as a BLOCKER + DIAGNOSTIC and recorded in the run events.
+
+Principles
+- No new inference, synthesis, or heuristic behavior is permitted by this phase. Phase 12 is purely guard-and-enforcement.
+- Tier A authority must not be silently resolved by the compiler. Any Tier A synthesis must be accompanied by an explicit BLOCKER event and a DIAGNOSTIC event for auditability.
+- REFUSAL outcomes require explicit authority metadata (evidence.refusal.authority) and the compiler asserts its presence at finalization. REFUSAL must not be used to mask missing authority.
+
+Enforcement
+- Assertions are centralized in `finalize_checklist` to fail fast when authority ceilings are violated.
+- Unit tests must verify that missing authority causes assertion failures rather than silent behavior.
+
+Signed: Governance
+Date: 2025-12-17
@@ -0,0 +1,36 @@
+# INFERENCE EXPLAINABILITY CONTRACT (AUTHORITATIVE)
+
+This document defines the authoritative explainability metadata required for any synthesized,
+coerced, inferred, or derived data emitted by the compiler and checklist pipeline.
+
+Mandatory metadata fields (attached to checklist items or synthesized objects):
+- `meta.source` — one of: `explicit | default | derived | coerced | inferred`
+- `meta.justification` — short machine-readable string explaining why the value was created (e.g., `safe_default`, `missing_spec_pointer`, `heuristic_prose_keyword_match`). For BLOCKER/REFUSAL-related inferences the justification MUST reference affected pointer(s) via `meta.justification_ptr` or by embedding the pointer in the justification code.
+- `meta.inference_type` — one of: `none | safe_default | heuristic | structural | fallback`
+- `meta.tier` — when applicable: `A | B | C` (reflects the template tier per `TEMPLATE_COMPILATION_CONTRACT.md`)
+
+Principles
+- Any inference must be recorded in machine-readable fields above; missing explainability
+  metadata is a compiler violation.
+- Violations are classified by tier: Tier A missing explainability → BLOCKER; Tier B → DIAGNOSTIC; Tier C → advisory.
+- Explainability metadata must be deterministic and include a short justification code suitable for audit and filtering.
+
+Examples
+- Synthesized default for missing `agents` (Tier A):
+  - `meta.source = "default"`
+  - `meta.justification = "safe_default_agents_list"`
+  - `meta.inference_type = "safe_default"`
+  - `meta.tier = "A"`
+
+- Prose-derived confidence heuristic:
+  - `meta.source = "inferred"`
+  - `meta.justification = "heuristic_prose_keyword_match"`
+  - `meta.inference_type = "heuristic"`
+  - `meta.tier = "C"` (if informal)
+
+Enforcement
+- The compiler attaches explainability metadata at each inference site; unit tests and CI guards assert the presence.
+- Tier A inferences without a corresponding checklist item or without explainability metadata are considered violations and will be detected via compiler assertions and failing tests.
+
+Signed: Governance
+Date: 2025-12-16
@@ -162,6 +162,16 @@ This file declares the governance anchor; enforcement logic will be implemented
 - `finalize_checklist(...)` is the sole emission boundary.
 - This invariant is enforced by code-level assertions and tests.
 
+## Refusal Authority Invariant (Phase 11C)
+
+- Statement: Every event with outcome `REFUSAL` **must** include structured refusal metadata under `evidence.refusal` containing keys: `authority` (non-empty string), `trigger`, `scope`, and `justification`.
+- Enforcement point: `src/shieldcraft/engine.finalize_checklist` asserts `authority` presence and exposes `checklist.refusal_authority`.
+- Failure classification: Missing or invalid `authority` is treated as a compiler assertion/implementation error and must fail the finalization boundary to avoid ambiguous REFUSALs.
+- Testable requirements:
+  - Unit tests must cover (a) REFUSALs recorded via `record_refusal_event` propagate authority into finalized checklist, and (b) REFUSALs recorded without `authority` cause finalization to raise an `AssertionError` and emit a paired diagnostic entry.
+- Evidence: Contract `docs/governance/REFUSAL_AUTHORITY_CONTRACT.md`, tests (`tests/test_refusal_authority.py`), and CI guard (`tests/ci/test_refusal_authority_persistence.py`).
+
+
 ## Semantic Outcome Invariants (Phase 5)
 
 - Statement: Every emitted checklist MUST contain a single canonical `primary_outcome` with value one of: `SUCCESS`, `REFUSAL`, `BLOCKED`, `DIAGNOSTIC_ONLY`.
@@ -203,3 +213,42 @@ This file declares the governance anchor; enforcement logic will be implemented
 
 - Lock: This invariant is locked by Phase 8 and may not be changed except via a governance phase update.
 
+---
+
+## Inference Explainability Invariant (Phase 11B/11D/11E)
+
+- Statement: All inferred, synthesized, coerced, or derived values that affect checklist emission or gating MUST include machine-readable explainability metadata attached to the affected object (item/meta/evidence/header). No silent inference is permitted: every non-explicit value must carry provenance and a justification code.
+- Required fields: `meta.source`, `meta.justification`, `meta.inference_type`, and `meta.tier` (when applicable for Tier A/B/C). BLOCKER/REFUSAL-related inferences MUST reference affected pointer(s) either via `meta.justification_ptr` or via an explicit pointer embedded in `meta.justification`.
+- Additional required provenance for specific cases:
+  - Coercions MUST preserve `meta.original_value` for auditability.
+  - Derived tasks MUST include `meta.derived_from = <parent_id>` and `meta.justification` referencing the derivation rule.
+  - Confidence assignments MUST include `confidence_meta` with `source` and `justification` fields.
+  - Unknown invariant expressions that are defaulted MUST attach `explainability` metadata and emit a DIAGNOSTIC checklist item.
+- Enforcement: Compiler unit tests and CI guards detect missing explainability metadata; Tier A omissions are BLOCKERs and Tier B are DIAGNOSTIC. Missing required provenance (e.g., `original_value` for coercions or `derived_from` for derived tasks) fails CI and must be remediated.
+- Rationale: Prevent silent intent invention by requiring that every automatic decision be auditable, machine-filterable, and provenance-bound.
+---
+
+## Compiler Hardening Invariants (Phase 11A)
+
+---
+
+## Inference Explainability Invariant (Phase 11B)
+
+- Statement: All synthesized, inferred, coerced, or derived data emitted by the compiler MUST include explainability metadata according to `docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md`.
+- Testable requirements:
+  - Any checklist item with `meta.synthesized_default == True` MUST have `meta.source`, `meta.justification`, and `meta.inference_type` defined.
+  - Any item whose fields are coerced or normalized by `ChecklistModel.normalize_item` MUST include `meta.source = "coerced"` and `meta.justification`.
+  - Any derived task emitted by `infer_tasks` MUST include `meta.source = "derived"` and `meta.justification`.
+  - Invariant evaluations that return safe defaults for unknown expressions MUST attach `explainability` metadata to the `invariant_results` entries.
+- Enforcement: Verified by unit tests and CI guards; violations are test failures and must be remediated promptly.
+
+- Statement: Compiler hardening measures introduced in Phase 11A (Tier enforcement, default synthesis, insufficiency diagnostics, and checklist quality scoring) are authoritative and must be enforced by the compiler pipeline.
+
+- Testable invariants:
+  - Tier enforcement is implemented in `src/shieldcraft/services/checklist/tier_enforcement.py::enforce_tiers` and must emit checklist items for missing Tier A/B sections (BLOCKER/DIAGNOSTIC respectively).
+  - Default synthesis is implemented in `src/shieldcraft/services/spec/defaults.py::synthesize_missing_spec_fields` and must be invoked exactly once during compiler entry (currently in `ChecklistGenerator.build`).
+  - Spec sufficiency diagnostics are implemented via `src/shieldcraft/services/spec/analysis.py::check_spec_sufficiency` and must produce DIAGNOSTIC checklist items without aborting compilation.
+  - Checklist quality scoring is implemented in `src/shieldcraft/services/checklist/quality.py::compute_checklist_quality` and its result MUST be attached to `checklist.meta.checklist_quality` by `finalize_checklist`.
+
+- Enforcement: Unit tests and CI guards verify these invariants; any regression must be patched and re-locked via governance decision.
+
@@ -0,0 +1,16 @@
+# OVER-SPEC TOLERANCE CONTRACT (AUTHORITATIVE)
+
+Phase 13 guarantees that compiler behavior is stable and non-drifting under over-complete and redundant specifications. This phase is guard-only: it adds tests and validations to detect conflicts and ensure invariance; it does not alter inference, synthesis, or authority ceilings.
+
+Principles
+- Redundant or repeated spec elements must not amplify inference or authority.
+- Extra non-conflicting detail must not change primary outcomes or escalate severities.
+- Conflicting explicit instructions must be surfaced (DIAGNOSTIC/BLOCKER) and not auto-resolved by the compiler.
+- Deterministic behavior (ordering, ids, hashing) must hold at scale.
+
+Enforcement
+- Deterministic unit tests verify redundancy tolerance, over-spec stability, explicit conflict visibility, and scale invariance.
+- Any violation that suggests silent authority escalation or resolution raises an assertion in tests and will be investigated.
+
+Signed: Governance
+Date: 2025-12-17
@@ -0,0 +1,25 @@
+# Persona Non-Authority Contract (AUTHORITATIVE)
+
+Decision: AUTHORITATIVE — Persona Protocol Boundary Locked (Phase 15)
+
+Summary
+- Personas are scoped specialists that may provide annotations, diagnostics, or advisory constraints but MUST NOT act as implicit authorities that alter checklist semantics or outcomes.
+
+Rules (authoritative)
+- Personas may only emit audit events (annotations, persona events) and record vetoes for observability; vetoes MUST be treated as advisory (DIAGNOSTIC) and MUST NOT cause REFUSAL or BLOCKER behavior.
+- Personas MUST NOT be permitted to directly mutate semantic fields that affect checklist primary outcome or refusal authority, including but not limited to: `id`, `ptr`, `generated`, `artifact`, `severity`, `refusal`, `outcome`.
+- Persona constraints that attempt disallowed mutations MUST be recorded in `item.meta.persona_constraints_disallowed` and a `G15_PERSONA_CONSTRAINT_DISALLOWED` DIAGNOSTIC event MUST be emitted for visibility.
+- Persona routing and evaluation order MUST be deterministic and recorded only as metadata or persona events; they MUST NOT influence primary checklist derivation.
+
+Rationale
+- Personas provide useful domain-specific advice and annotations but must not supplant governance or authority ceilings. Ensuring personas are advisory preserves auditability, reduces accidental refusal, and prevents stealth authority escalation.
+
+Enforcement
+- Implementation-level enforcement is via deterministic tests and lightweight runtime guards (recording disallowed attempts and converting previous persona veto raises into advisory DIAGNOSTIC events).
+- Tests: `tests/persona/test_persona_protocol_boundary_routing_invariance.py`, `tests/persona/test_persona_decision_precedence.py`, `tests/persona/test_persona_non_interference_guards.py`, `tests/persona/test_persona_scale_order_invariance.py` verify contract requirements.
+
+Cross-references
+- AUTHORITY_CEILING_CONTRACT.md (persona outputs must respect authority ceilings)
+- TEMPLATE_NON_AUTHORITY_CONTRACT.md (templates non-authoritative policy)
+
+Status: AUTHORITATIVE (locked)
@@ -0,0 +1,40 @@
+# Refusal Authority Contract (AUTHORITATIVE)
+
+Decision date: 2025-12-17
+
+Summary
+- This contract establishes an authoritative mapping from REFUSAL gate identifiers
+  (e.g., `G2_GOVERNANCE_PRESENCE_CHECK`) to named refusal authorities
+  (e.g., `governance`, `persona`, `infrastructure`).
+- Every REFUSAL event emitted by the engine MUST include structured refusal
+  metadata in its `evidence.refusal` object with keys: `authority`, `trigger`,
+  `scope`, and `justification`.
+
+Requirements
+- The codebase provides a deterministic map in `src/shieldcraft/services/governance/refusal_authority.py`.
+- The helper `record_refusal_event(context, gate_id, phase, ...)` must be used
+  to record REFUSALs at known gates and attaches the required `refusal` metadata.
+- The finalization boundary (`finalize_checklist`) asserts that any REFUSAL in the
+  finalized events includes a valid, non-empty `authority` string and surfaces
+  the authority as `checklist.refusal_authority`.
+
+Rationale
+- Explicit authority attribution enables auditability, makes REFUSALs
+  inspectable by automated tools, and reduces ambiguity in governance decisions.
+- Structured refusal metadata enables paired diagnostics for missing authority
+  cases and deterministic diagnostics masking.
+
+Enforcement
+- Code-level assertions validate presence and type of `authority` during
+  checklist finalization.
+- Unit tests (`tests/test_refusal_authority.py`) cover normal and failure modes.
+- CI guards must include `tests/ci/test_refusal_authority_persistence.py` to ensure
+  the authoritative map covers known REFUSAL gates used by the engine.
+
+Scope
+- This contract is AUTHORITATIVE for REFUSAL metadata assignment. Any future
+  edits to gate→authority mappings require an explicit decision recorded in
+  `docs/governance/decision_log.md`.
+
+Owners
+- Governance (docs/governance)
@@ -0,0 +1,15 @@
+# TEMPLATE NON-AUTHORITY CONTRACT (AUTHORITATIVE)
+
+This contract ensures templates are pluggable, versioned, and non-authoritative. Templates are presentation artifacts and must not be used to infer or assert authority over checklist outcomes.
+
+Principles
+- Templates provide rendering and placeholder defaults only; they must never inject BLOCKER or REFUSAL outcomes or otherwise alter authoritative decisioning.
+- Template metadata (name/version) is recorded for provenance only and must not change primary outcomes or authority ceilings.
+- Missing templates must fallback deterministically and must not escalate authority.
+
+Enforcement
+- Tests assert checklist invariance across template versions and that templates cannot generate authority outcomes.
+- Any evidence of template-induced authority must fail deterministic tests and be addressed immediately.
+
+Signed: Governance
+Date: 2025-12-17
@@ -0,0 +1,43 @@
+# Test Collection Audit
+
+Date: 2025-12-17
+
+Summary: The following pytest collection/import errors were observed when running `pytest -q` from repo root.
+
+Failing entries (observed):
+
+1) ERROR collecting tests/plan/test_execution_plan.py
+   - Error: import file mismatch
+   - Details: imported module 'test_execution_plan' has __file__ attribute pointing to `/tests/checklist/test_execution_plan.py` while test file being collected is `/tests/plan/test_execution_plan.py`.
+   - Root cause: duplicate test basenames across different directories causing pytest import module name collisions (name-collision / stale pyc influence).
+   - Classification: name-collision (stale __pycache__ can exacerbate this)
+
+2) ERROR collecting tests/requirements/test_completeness.py
+   - Error: import file mismatch
+   - Details: imported module 'test_completeness' points to `/tests/ast/test_completeness.py` while collected file is `/tests/requirements/test_completeness.py`.
+   - Root cause: duplicate basename `test_completeness.py` in separate test packages.
+   - Classification: name-collision
+
+3) ERROR collecting tests/requirements/test_extractor.py
+   - Error: import file mismatch
+   - Details: imported module 'test_extractor' points to `/tests/test_extractor.py` while collected file is `/tests/requirements/test_extractor.py`.
+   - Root cause: duplicate basename `test_extractor.py`.
+   - Classification: name-collision
+
+4) ERROR collecting tests/sufficiency/test_sufficiency.py
+   - Error: import file mismatch
+   - Details: imported module 'test_sufficiency' points to `/tests/requirements/test_sufficiency.py` while collected file is `/tests/sufficiency/test_sufficiency.py`.
+   - Root cause: duplicate basename `test_sufficiency.py`.
+   - Classification: name-collision
+
+Notes:
+- These are all name-collision issues (multiple tests with identical module basenames). Pytest's import mechanism maps module names (based on basenames) which can conflict when multiple tests share the same file name in different subdirectories; stale .pyc/__pycache__ exacerbates the issue.
+- No production code changes are required; fixes are test-only (renames + pytest config + guard tests + cleanup helpers).
+
+Next steps (Phase 16 plan):
+- Normalize duplicate filenames deterministically (append category prefix/suffix).
+- Add pytest configuration (pytest.ini) with explicit `testpaths` and `norecursedirs` entries.
+- Add `tests/conftest.py` session setup to set `PYTHONDONTWRITEBYTECODE=1` and ensure `src` is on `sys.path` for absolute imports.
+- Add guard tests: duplicate basename detector and 'no relative imports' detector.
+- Add cleanup script to remove stale `__pycache__` and `.pyc` files.
+- Add authoritative governance doc `docs/governance/TEST_COLLECTION_STABILITY_CONTRACT.md` and update `decision_log.md` (Phase 16 closed).
@@ -0,0 +1,21 @@
+# Test Collection Stability Contract (AUTHORITATIVE)
+
+Decision: AUTHORITATIVE — Phase 16: Test & CI Stability Locked
+
+Summary
+- Pytest collection must be deterministic and robust to local bytecode artifacts and duplicate basenames. CI must run the full test suite with zero collection/import errors.
+
+Rules
+- Duplicate test basenames across directories are disallowed. CI will fail the run if duplicates are detected.
+- Tests must not rely on relative imports; absolute imports via `shieldcraft.*` are preferred.
+- Bytecode cache pollution (`__pycache__`, `.pyc`) must be removed or ignored during CI runs. CI will include a cleanup step (`scripts/ci/clean_pycache.py`) to enforce this.
+- Pytest configuration is authoritative and stored in `pytest.ini` with explicit `testpaths` and `norecursedirs` to prevent spurious discovery.
+
+Enforcement
+- Guard tests added: `tests/ci/test_no_duplicate_test_basenames.py`, `tests/ci/test_no_relative_imports.py`.
+- CI/maintainers: ensure `scripts/ci/clean_pycache.py` is invoked in CI pre-step; don't rely on ephemeral local caches.
+
+Rationale
+- Prevents brittle, environment-dependent failures during CI by eliminating common sources of import/collection errors.
+
+Status: AUTHORITATIVE (locked)