Skip to content

Commit a7f1948

Browse files
committed
chore(tests): normalize duplicate test basenames (Phase 16)
1 parent 2d03b18 commit a7f1948

File tree

67 files changed

+2876
-150
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+2876
-150
lines changed

docs/CHECKLIST_OUTCOME_CONTRACT.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# CHECKLIST OUTCOME CONTRACT (AUTHORITATIVE)
2+
3+
This document is AUTHORITATIVE: the single source of truth for deriving checklist run outcomes.
4+
5+
Contract
6+
--------
7+
- The primary outcome of a checklist run MUST be derived exclusively by `derive_primary_outcome(checklist, events)`.
8+
- Authoritative precedence: **REFUSAL** > **BLOCKED** > **ACTION** > **DIAGNOSTIC**.
9+
- Persona annotations are advisory only and MUST NOT override the derived primary outcome.
10+
- The function MUST return: `primary_outcome`, `refusal` (bool), `blocking_reasons` (list), and `confidence_level` (one of `high|medium|low`).
11+
- No component may infer or mutate the `primary_outcome` outside of this contract; any attempts are recorded as diagnostic events.
12+
13+
Rationale
14+
---------
15+
Centralizing outcome derivation improves auditability and prevents duplicated heuristics appearing in multiple modules. The canonical function is deterministic and tests enforce idempotence and precedence.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# AUTHORITY CEILING CONTRACT (AUTHORITATIVE)
2+
3+
This contract defines the authority ceilings for the compiler and enforces guard-only behavior: the compiler must never escalate authority beyond what the spec explicitly grants unless that escalation is exposed as a BLOCKER + DIAGNOSTIC and recorded in the run events.
4+
5+
Principles
6+
- No new inference, synthesis, or heuristic behavior is permitted by this phase. Phase 12 is purely guard-and-enforcement.
7+
- Tier A authority must not be silently resolved by the compiler. Any Tier A synthesis must be accompanied by an explicit BLOCKER event and a DIAGNOSTIC event for auditability.
8+
- REFUSAL outcomes require explicit authority metadata (evidence.refusal.authority) and the compiler asserts its presence at finalization. REFUSAL must not be used to mask missing authority.
9+
10+
Enforcement
11+
- Assertions are centralized in `finalize_checklist` to fail fast when authority ceilings are violated.
12+
- Unit tests must verify that missing authority causes assertion failures rather than silent behavior.
13+
14+
Signed: Governance
15+
Date: 2025-12-17
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# INFERENCE EXPLAINABILITY CONTRACT (AUTHORITATIVE)
2+
3+
This document defines the authoritative explainability metadata required for any synthesized,
4+
coerced, inferred, or derived data emitted by the compiler and checklist pipeline.
5+
6+
Mandatory metadata fields (attached to checklist items or synthesized objects):
7+
- `meta.source` — one of: `explicit | default | derived | coerced | inferred`
8+
- `meta.justification` — short machine-readable string explaining why the value was created (e.g., `safe_default`, `missing_spec_pointer`, `heuristic_prose_keyword_match`). For BLOCKER/REFUSAL-related inferences the justification MUST reference affected pointer(s) via `meta.justification_ptr` or by embedding the pointer in the justification code.
9+
- `meta.inference_type` — one of: `none | safe_default | heuristic | structural | fallback`
10+
- `meta.tier` — when applicable: `A | B | C` (reflects the template tier per `TEMPLATE_COMPILATION_CONTRACT.md`)
11+
12+
Principles
13+
- Any inference must be recorded in machine-readable fields above; missing explainability
14+
metadata is a compiler violation.
15+
- Violations are classified by tier: Tier A missing explainability → BLOCKER; Tier B → DIAGNOSTIC; Tier C → advisory.
16+
- Explainability metadata must be deterministic and include a short justification code suitable for audit and filtering.
17+
18+
Examples
19+
- Synthesized default for missing `agents` (Tier A):
20+
- `meta.source = "default"`
21+
- `meta.justification = "safe_default_agents_list"`
22+
- `meta.inference_type = "safe_default"`
23+
- `meta.tier = "A"`
24+
25+
- Prose-derived confidence heuristic:
26+
- `meta.source = "inferred"`
27+
- `meta.justification = "heuristic_prose_keyword_match"`
28+
- `meta.inference_type = "heuristic"`
29+
- `meta.tier = "C"` (if informal)
30+
31+
Enforcement
32+
- The compiler attaches explainability metadata at each inference site; unit tests and CI guards assert the presence.
33+
- Tier A inferences without a corresponding checklist item or without explainability metadata are considered violations and will be detected via compiler assertions and failing tests.
34+
35+
Signed: Governance
36+
Date: 2025-12-16

docs/governance/INVARIANTS.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,16 @@ This file declares the governance anchor; enforcement logic will be implemented
162162
- `finalize_checklist(...)` is the sole emission boundary.
163163
- This invariant is enforced by code-level assertions and tests.
164164

165+
## Refusal Authority Invariant (Phase 11C)
166+
167+
- Statement: Every event with outcome `REFUSAL` **must** include structured refusal metadata under `evidence.refusal` containing keys: `authority` (non-empty string), `trigger`, `scope`, and `justification`.
168+
- Enforcement point: `src/shieldcraft/engine.finalize_checklist` asserts `authority` presence and exposes `checklist.refusal_authority`.
169+
- Failure classification: Missing or invalid `authority` is treated as a compiler assertion/implementation error and must fail the finalization boundary to avoid ambiguous REFUSALs.
170+
- Testable requirements:
171+
- Unit tests must cover (a) REFUSALs recorded via `record_refusal_event` propagate authority into finalized checklist, and (b) REFUSALs recorded without `authority` cause finalization to raise an `AssertionError` and emit a paired diagnostic entry.
172+
- Evidence: Contract `docs/governance/REFUSAL_AUTHORITY_CONTRACT.md`, tests (`tests/test_refusal_authority.py`), and CI guard (`tests/ci/test_refusal_authority_persistence.py`).
173+
174+
165175
## Semantic Outcome Invariants (Phase 5)
166176

167177
- Statement: Every emitted checklist MUST contain a single canonical `primary_outcome` with value one of: `SUCCESS`, `REFUSAL`, `BLOCKED`, `DIAGNOSTIC_ONLY`.
@@ -203,3 +213,42 @@ This file declares the governance anchor; enforcement logic will be implemented
203213

204214
- Lock: This invariant is locked by Phase 8 and may not be changed except via a governance phase update.
205215

216+
---
217+
218+
## Inference Explainability Invariant (Phase 11B/11D/11E)
219+
220+
- Statement: All inferred, synthesized, coerced, or derived values that affect checklist emission or gating MUST include machine-readable explainability metadata attached to the affected object (item/meta/evidence/header). No silent inference is permitted: every non-explicit value must carry provenance and a justification code.
221+
- Required fields: `meta.source`, `meta.justification`, `meta.inference_type`, and `meta.tier` (when applicable for Tier A/B/C). BLOCKER/REFUSAL-related inferences MUST reference affected pointer(s) either via `meta.justification_ptr` or via an explicit pointer embedded in `meta.justification`.
222+
- Additional required provenance for specific cases:
223+
- Coercions MUST preserve `meta.original_value` for auditability.
224+
- Derived tasks MUST include `meta.derived_from = <parent_id>` and `meta.justification` referencing the derivation rule.
225+
- Confidence assignments MUST include `confidence_meta` with `source` and `justification` fields.
226+
- Unknown invariant expressions that are defaulted MUST attach `explainability` metadata and emit a DIAGNOSTIC checklist item.
227+
- Enforcement: Compiler unit tests and CI guards detect missing explainability metadata; Tier A omissions are BLOCKERs and Tier B are DIAGNOSTIC. Missing required provenance (e.g., `original_value` for coercions or `derived_from` for derived tasks) fails CI and must be remediated.
228+
- Rationale: Prevent silent intent invention by requiring that every automatic decision be auditable, machine-filterable, and provenance-bound.
229+
---
230+
231+
## Compiler Hardening Invariants (Phase 11A)
232+
233+
---
234+
235+
## Inference Explainability Invariant (Phase 11B)
236+
237+
- Statement: All synthesized, inferred, coerced, or derived data emitted by the compiler MUST include explainability metadata according to `docs/governance/INFERENCE_EXPLAINABILITY_CONTRACT.md`.
238+
- Testable requirements:
239+
- Any checklist item with `meta.synthesized_default == True` MUST have `meta.source`, `meta.justification`, and `meta.inference_type` defined.
240+
- Any item whose fields are coerced or normalized by `ChecklistModel.normalize_item` MUST include `meta.source = "coerced"` and `meta.justification`.
241+
- Any derived task emitted by `infer_tasks` MUST include `meta.source = "derived"` and `meta.justification`.
242+
- Invariant evaluations that return safe defaults for unknown expressions MUST attach `explainability` metadata to the `invariant_results` entries.
243+
- Enforcement: Verified by unit tests and CI guards; violations are test failures and must be remediated promptly.
244+
245+
- Statement: Compiler hardening measures introduced in Phase 11A (Tier enforcement, default synthesis, insufficiency diagnostics, and checklist quality scoring) are authoritative and must be enforced by the compiler pipeline.
246+
247+
- Testable invariants:
248+
- Tier enforcement is implemented in `src/shieldcraft/services/checklist/tier_enforcement.py::enforce_tiers` and must emit checklist items for missing Tier A/B sections (BLOCKER/DIAGNOSTIC respectively).
249+
- Default synthesis is implemented in `src/shieldcraft/services/spec/defaults.py::synthesize_missing_spec_fields` and must be invoked exactly once during compiler entry (currently in `ChecklistGenerator.build`).
250+
- Spec sufficiency diagnostics are implemented via `src/shieldcraft/services/spec/analysis.py::check_spec_sufficiency` and must produce DIAGNOSTIC checklist items without aborting compilation.
251+
- Checklist quality scoring is implemented in `src/shieldcraft/services/checklist/quality.py::compute_checklist_quality` and its result MUST be attached to `checklist.meta.checklist_quality` by `finalize_checklist`.
252+
253+
- Enforcement: Unit tests and CI guards verify these invariants; any regression must be patched and re-locked via governance decision.
254+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# OVER-SPEC TOLERANCE CONTRACT (AUTHORITATIVE)
2+
3+
Phase 13 guarantees that compiler behavior is stable and non-drifting under over-complete and redundant specifications. This phase is guard-only: it adds tests and validations to detect conflicts and ensure invariance; it does not alter inference, synthesis, or authority ceilings.
4+
5+
Principles
6+
- Redundant or repeated spec elements must not amplify inference or authority.
7+
- Extra non-conflicting detail must not change primary outcomes or escalate severities.
8+
- Conflicting explicit instructions must be surfaced (DIAGNOSTIC/BLOCKER) and not auto-resolved by the compiler.
9+
- Deterministic behavior (ordering, ids, hashing) must hold at scale.
10+
11+
Enforcement
12+
- Deterministic unit tests verify redundancy tolerance, over-spec stability, explicit conflict visibility, and scale invariance.
13+
- Any violation that suggests silent authority escalation or resolution raises an assertion in tests and will be investigated.
14+
15+
Signed: Governance
16+
Date: 2025-12-17
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Persona Non-Authority Contract (AUTHORITATIVE)
2+
3+
Decision: AUTHORITATIVE — Persona Protocol Boundary Locked (Phase 15)
4+
5+
Summary
6+
- Personas are scoped specialists that may provide annotations, diagnostics, or advisory constraints but MUST NOT act as implicit authorities that alter checklist semantics or outcomes.
7+
8+
Rules (authoritative)
9+
- Personas may only emit audit events (annotations, persona events) and record vetoes for observability; vetoes MUST be treated as advisory (DIAGNOSTIC) and MUST NOT cause REFUSAL or BLOCKER behavior.
10+
- Personas MUST NOT be permitted to directly mutate semantic fields that affect checklist primary outcome or refusal authority, including but not limited to: `id`, `ptr`, `generated`, `artifact`, `severity`, `refusal`, `outcome`.
11+
- Persona constraints that attempt disallowed mutations MUST be recorded in `item.meta.persona_constraints_disallowed` and a `G15_PERSONA_CONSTRAINT_DISALLOWED` DIAGNOSTIC event MUST be emitted for visibility.
12+
- Persona routing and evaluation order MUST be deterministic and recorded only as metadata or persona events; they MUST NOT influence primary checklist derivation.
13+
14+
Rationale
15+
- Personas provide useful domain-specific advice and annotations but must not supplant governance or authority ceilings. Ensuring personas are advisory preserves auditability, reduces accidental refusal, and prevents stealth authority escalation.
16+
17+
Enforcement
18+
- Implementation-level enforcement is via deterministic tests and lightweight runtime guards (recording disallowed attempts and converting previous persona veto raises into advisory DIAGNOSTIC events).
19+
- Tests: `tests/persona/test_persona_protocol_boundary_routing_invariance.py`, `tests/persona/test_persona_decision_precedence.py`, `tests/persona/test_persona_non_interference_guards.py`, `tests/persona/test_persona_scale_order_invariance.py` verify contract requirements.
20+
21+
Cross-references
22+
- AUTHORITY_CEILING_CONTRACT.md (persona outputs must respect authority ceilings)
23+
- TEMPLATE_NON_AUTHORITY_CONTRACT.md (templates non-authoritative policy)
24+
25+
Status: AUTHORITATIVE (locked)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Refusal Authority Contract (AUTHORITATIVE)
2+
3+
Decision date: 2025-12-17
4+
5+
Summary
6+
- This contract establishes an authoritative mapping from REFUSAL gate identifiers
7+
(e.g., `G2_GOVERNANCE_PRESENCE_CHECK`) to named refusal authorities
8+
(e.g., `governance`, `persona`, `infrastructure`).
9+
- Every REFUSAL event emitted by the engine MUST include structured refusal
10+
metadata in its `evidence.refusal` object with keys: `authority`, `trigger`,
11+
`scope`, and `justification`.
12+
13+
Requirements
14+
- The codebase provides a deterministic map in `src/shieldcraft/services/governance/refusal_authority.py`.
15+
- The helper `record_refusal_event(context, gate_id, phase, ...)` must be used
16+
to record REFUSALs at known gates and attaches the required `refusal` metadata.
17+
- The finalization boundary (`finalize_checklist`) asserts that any REFUSAL in the
18+
finalized events includes a valid, non-empty `authority` string and surfaces
19+
the authority as `checklist.refusal_authority`.
20+
21+
Rationale
22+
- Explicit authority attribution enables auditability, makes REFUSALs
23+
inspectable by automated tools, and reduces ambiguity in governance decisions.
24+
- Structured refusal metadata enables paired diagnostics for missing authority
25+
cases and deterministic diagnostics masking.
26+
27+
Enforcement
28+
- Code-level assertions validate presence and type of `authority` during
29+
checklist finalization.
30+
- Unit tests (`tests/test_refusal_authority.py`) cover normal and failure modes.
31+
- CI guards must include `tests/ci/test_refusal_authority_persistence.py` to ensure
32+
the authoritative map covers known REFUSAL gates used by the engine.
33+
34+
Scope
35+
- This contract is AUTHORITATIVE for REFUSAL metadata assignment. Any future
36+
edits to gate→authority mappings require an explicit decision recorded in
37+
`docs/governance/decision_log.md`.
38+
39+
Owners
40+
- Governance (docs/governance)
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# TEMPLATE NON-AUTHORITY CONTRACT (AUTHORITATIVE)
2+
3+
This contract ensures templates are pluggable, versioned, and non-authoritative. Templates are presentation artifacts and must not be used to infer or assert authority over checklist outcomes.
4+
5+
Principles
6+
- Templates provide rendering and placeholder defaults only; they must never inject BLOCKER or REFUSAL outcomes or otherwise alter authoritative decisioning.
7+
- Template metadata (name/version) is recorded for provenance only and must not change primary outcomes or authority ceilings.
8+
- Missing templates must fallback deterministically and must not escalate authority.
9+
10+
Enforcement
11+
- Tests assert checklist invariance across template versions and that templates cannot generate authority outcomes.
12+
- Any evidence of template-induced authority must fail deterministic tests and be addressed immediately.
13+
14+
Signed: Governance
15+
Date: 2025-12-17
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Test Collection Audit
2+
3+
Date: 2025-12-17
4+
5+
Summary: The following pytest collection/import errors were observed when running `pytest -q` from repo root.
6+
7+
Failing entries (observed):
8+
9+
1) ERROR collecting tests/plan/test_execution_plan.py
10+
- Error: import file mismatch
11+
- Details: imported module 'test_execution_plan' has __file__ attribute pointing to `/tests/checklist/test_execution_plan.py` while test file being collected is `/tests/plan/test_execution_plan.py`.
12+
- Root cause: duplicate test basenames across different directories causing pytest import module name collisions (name-collision / stale pyc influence).
13+
- Classification: name-collision (stale __pycache__ can exacerbate this)
14+
15+
2) ERROR collecting tests/requirements/test_completeness.py
16+
- Error: import file mismatch
17+
- Details: imported module 'test_completeness' points to `/tests/ast/test_completeness.py` while collected file is `/tests/requirements/test_completeness.py`.
18+
- Root cause: duplicate basename `test_completeness.py` in separate test packages.
19+
- Classification: name-collision
20+
21+
3) ERROR collecting tests/requirements/test_extractor.py
22+
- Error: import file mismatch
23+
- Details: imported module 'test_extractor' points to `/tests/test_extractor.py` while collected file is `/tests/requirements/test_extractor.py`.
24+
- Root cause: duplicate basename `test_extractor.py`.
25+
- Classification: name-collision
26+
27+
4) ERROR collecting tests/sufficiency/test_sufficiency.py
28+
- Error: import file mismatch
29+
- Details: imported module 'test_sufficiency' points to `/tests/requirements/test_sufficiency.py` while collected file is `/tests/sufficiency/test_sufficiency.py`.
30+
- Root cause: duplicate basename `test_sufficiency.py`.
31+
- Classification: name-collision
32+
33+
Notes:
34+
- These are all name-collision issues (multiple tests with identical module basenames). Pytest's import mechanism maps module names (based on basenames) which can conflict when multiple tests share the same file name in different subdirectories; stale .pyc/__pycache__ exacerbates the issue.
35+
- No production code changes are required; fixes are test-only (renames + pytest config + guard tests + cleanup helpers).
36+
37+
Next steps (Phase 16 plan):
38+
- Normalize duplicate filenames deterministically (append category prefix/suffix).
39+
- Add pytest configuration (pytest.ini) with explicit `testpaths` and `norecursedirs` entries.
40+
- Add `tests/conftest.py` session setup to set `PYTHONDONTWRITEBYTECODE=1` and ensure `src` is on `sys.path` for absolute imports.
41+
- Add guard tests: duplicate basename detector and 'no relative imports' detector.
42+
- Add cleanup script to remove stale `__pycache__` and `.pyc` files.
43+
- Add authoritative governance doc `docs/governance/TEST_COLLECTION_STABILITY_CONTRACT.md` and update `decision_log.md` (Phase 16 closed).
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Test Collection Stability Contract (AUTHORITATIVE)
2+
3+
Decision: AUTHORITATIVE — Phase 16: Test & CI Stability Locked
4+
5+
Summary
6+
- Pytest collection must be deterministic and robust to local bytecode artifacts and duplicate basenames. CI must run the full test suite with zero collection/import errors.
7+
8+
Rules
9+
- Duplicate test basenames across directories are disallowed. CI will fail the run if duplicates are detected.
10+
- Tests must not rely on relative imports; absolute imports via `shieldcraft.*` are preferred.
11+
- Bytecode cache pollution (`__pycache__`, `.pyc`) must be removed or ignored during CI runs. CI will include a cleanup step (`scripts/ci/clean_pycache.py`) to enforce this.
12+
- Pytest configuration is authoritative and stored in `pytest.ini` with explicit `testpaths` and `norecursedirs` to prevent spurious discovery.
13+
14+
Enforcement
15+
- Guard tests added: `tests/ci/test_no_duplicate_test_basenames.py`, `tests/ci/test_no_relative_imports.py`.
16+
- CI/maintainers: ensure `scripts/ci/clean_pycache.py` is invoked in CI pre-step; don't rely on ephemeral local caches.
17+
18+
Rationale
19+
- Prevents brittle, environment-dependent failures during CI by eliminating common sources of import/collection errors.
20+
21+
Status: AUTHORITATIVE (locked)

0 commit comments

Comments
 (0)