Decision & Output: outcome + proof from history

# Decision & Output: outcome + proof from history

> full spec at [spec](https://gist.github.com/functor-flow/e5ad7923a0853399800c46f99d35ecfe)

```mermaid
flowchart TD
  Task["SearchTask"]
  Hist["SearchHistory (aggregates + hits)"]
  Attempts["current SERP attempts"]
  Max["max SERP attempts"]

  subgraph DecisionMod["Decision module"]
    Score["compute support/refute scores"]
    Enough{"evidence >= threshold?"}
    Direction{"support > refute?"}
    MoreSearch{"can we search again?"}
  end

  Result["AgentResult (True / False / Invalid)"]
  NeedMore["need more SERP search"]

  Task --> Score
  Hist --> Score

  Score --> Enough
  Enough -- "yes" --> Direction
  Enough -- "no" --> MoreSearch

  Direction -- "support" --> Result
  Direction -- "refute" --> Result

  Attempts --> MoreSearch
  Max --> MoreSearch
  MoreSearch -- "yes" --> NeedMore
  MoreSearch -- "no" --> Result
```

**Status:** Not implemented for the new agent pipeline (existing validator decision code is separate).

**Role.** Look at `SearchTask` + `SearchHistory`, decide whether we already have enough evidence to answer True/False, or whether the claim should be treated as Invalid, and tell the loop if another SERP round is worth running.

**Responsibilities.**
- Implement a decision module that reads only `SearchTask`, `SearchHistory`, and simple config (evidence thresholds, max SERP attempts).
- Aggregate support and refute scores from `SearchHistory.evidence` using `EvidenceHit.weight` (primary/wire/trade/other), and keep simple totals like `totalSupport` and `totalRefute`. A good starting scale is: primary domains = 1.0, wire services ≈ 0.8, trade/specialist press ≈ 0.6, and other/long-tail sources ≤ 0.4.
- Use a documented threshold rule to decide when evidence is "enough": for example, treat the evidence as sufficient when either support or refute weight is ≥ 1.6 from at least two independent, time-aligned sources (mirroring the primary/wire weighting sketch in the spec).
- Choose one of three outcomes to start with: **True**, **False**, or **Invalid**. Use True when support clearly wins above threshold, False when refute clearly wins, and Invalid when we either never reach the threshold or the evidence is too conflicting/noisy by the time SERP attempts are exhausted.
- Expose a small decision result that either (a) contains a final `AgentResult` (outcome + proof + sources) or (b) says "need more SERP search" so the loop can ask Serper for another query if `current_serp_attempts < max_serp_attempts`.
- When producing a final `AgentResult`, call a small LLM-backed proof writer that takes `SearchTask` + the best EvidenceHits and returns a compact markdown proof, then map that into the shared `AgentResult` shape from the Input issue (#4).

The `AgentResult` contract is shared with the Input layer:

```ts
interface AgentResult {
  outcome: string; // e.g. True | False | Invalid (initially)
  proof: string;   // short markdown summary + key snippets
  sources: Array<{
    url: string;
    title: string;
    pub_date: string | null;
    excerpt: string;
  }>;
  debug?: {
    total_queries?: number;
    total_pages_visited?: number;
  };
}
```

---

## TODO

- [ ] Implement an initial rule-based decision helper that uses `SearchHistory` aggregates (support/refute totals, number of independent sources, timeframe alignment) to decide when evidence is enough for True / False and when we should fall back to Invalid or request more SERP search. Start with a simple weighting scheme (primary = 1.0, wire ≈ 0.8, trade ≈ 0.6, other ≤ 0.4) and a threshold like `support >= 1.6` or `refute >= 1.6` from ≥2 independent sources.
- [ ] Define a decision boundary, e.g. `decideFromHistory(task, history, attempts, maxAttempts)`, that wraps this helper and returns either `{ status: 'final', result: AgentResult }` or `{ status: 'need_more_search' }` based only on `SearchTask` + `SearchHistory` + simple counters.
- [ ] Keep a single `decideFromHistory` entrypoint so Validator and external callers never depend on internal `SearchHistory` details; future changes to weights, thresholds, or additional outcomes should only require updating the internal rules, not the boundary itself.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Decision & Output: outcome + proof from history #10

Decision & Output: outcome + proof from history

TODO

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decision & Output: outcome + proof from history #10

Description

Decision & Output: outcome + proof from history

TODO

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions