Skip to content

Decision & Output: outcome + proof from history #10

@functor-flow

Description

@functor-flow

Decision & Output: outcome + proof from history

full spec at spec

flowchart TD
  Task["SearchTask"]
  Hist["SearchHistory (aggregates + hits)"]
  Attempts["current SERP attempts"]
  Max["max SERP attempts"]

  subgraph DecisionMod["Decision module"]
    Score["compute support/refute scores"]
    Enough{"evidence >= threshold?"}
    Direction{"support > refute?"}
    MoreSearch{"can we search again?"}
  end

  Result["AgentResult (True / False / Invalid)"]
  NeedMore["need more SERP search"]

  Task --> Score
  Hist --> Score

  Score --> Enough
  Enough -- "yes" --> Direction
  Enough -- "no" --> MoreSearch

  Direction -- "support" --> Result
  Direction -- "refute" --> Result

  Attempts --> MoreSearch
  Max --> MoreSearch
  MoreSearch -- "yes" --> NeedMore
  MoreSearch -- "no" --> Result
Loading

Status: Not implemented for the new agent pipeline (existing validator decision code is separate).

Role. Look at SearchTask + SearchHistory, decide whether we already have enough evidence to answer True/False, or whether the claim should be treated as Invalid, and tell the loop if another SERP round is worth running.

Responsibilities.

  • Implement a decision module that reads only SearchTask, SearchHistory, and simple config (evidence thresholds, max SERP attempts).
  • Aggregate support and refute scores from SearchHistory.evidence using EvidenceHit.weight (primary/wire/trade/other), and keep simple totals like totalSupport and totalRefute. A good starting scale is: primary domains = 1.0, wire services ≈ 0.8, trade/specialist press ≈ 0.6, and other/long-tail sources ≤ 0.4.
  • Use a documented threshold rule to decide when evidence is "enough": for example, treat the evidence as sufficient when either support or refute weight is ≥ 1.6 from at least two independent, time-aligned sources (mirroring the primary/wire weighting sketch in the spec).
  • Choose one of three outcomes to start with: True, False, or Invalid. Use True when support clearly wins above threshold, False when refute clearly wins, and Invalid when we either never reach the threshold or the evidence is too conflicting/noisy by the time SERP attempts are exhausted.
  • Expose a small decision result that either (a) contains a final AgentResult (outcome + proof + sources) or (b) says "need more SERP search" so the loop can ask Serper for another query if current_serp_attempts < max_serp_attempts.
  • When producing a final AgentResult, call a small LLM-backed proof writer that takes SearchTask + the best EvidenceHits and returns a compact markdown proof, then map that into the shared AgentResult shape from the Input issue (Input: Web‑search agent input layer #4).

The AgentResult contract is shared with the Input layer:

interface AgentResult {
  outcome: string; // e.g. True | False | Invalid (initially)
  proof: string;   // short markdown summary + key snippets
  sources: Array<{
    url: string;
    title: string;
    pub_date: string | null;
    excerpt: string;
  }>;
  debug?: {
    total_queries?: number;
    total_pages_visited?: number;
  };
}

TODO

  • Implement an initial rule-based decision helper that uses SearchHistory aggregates (support/refute totals, number of independent sources, timeframe alignment) to decide when evidence is enough for True / False and when we should fall back to Invalid or request more SERP search. Start with a simple weighting scheme (primary = 1.0, wire ≈ 0.8, trade ≈ 0.6, other ≤ 0.4) and a threshold like support >= 1.6 or refute >= 1.6 from ≥2 independent sources.
  • Define a decision boundary, e.g. decideFromHistory(task, history, attempts, maxAttempts), that wraps this helper and returns either { status: 'final', result: AgentResult } or { status: 'need_more_search' } based only on SearchTask + SearchHistory + simple counters.
  • Keep a single decideFromHistory entrypoint so Validator and external callers never depend on internal SearchHistory details; future changes to weights, thresholds, or additional outcomes should only require updating the internal rules, not the boundary itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions