-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Decision & Output: outcome + proof from history
full spec at spec
flowchart TD
Task["SearchTask"]
Hist["SearchHistory (aggregates + hits)"]
Attempts["current SERP attempts"]
Max["max SERP attempts"]
subgraph DecisionMod["Decision module"]
Score["compute support/refute scores"]
Enough{"evidence >= threshold?"}
Direction{"support > refute?"}
MoreSearch{"can we search again?"}
end
Result["AgentResult (True / False / Invalid)"]
NeedMore["need more SERP search"]
Task --> Score
Hist --> Score
Score --> Enough
Enough -- "yes" --> Direction
Enough -- "no" --> MoreSearch
Direction -- "support" --> Result
Direction -- "refute" --> Result
Attempts --> MoreSearch
Max --> MoreSearch
MoreSearch -- "yes" --> NeedMore
MoreSearch -- "no" --> Result
Status: Not implemented for the new agent pipeline (existing validator decision code is separate).
Role. Look at SearchTask + SearchHistory, decide whether we already have enough evidence to answer True/False, or whether the claim should be treated as Invalid, and tell the loop if another SERP round is worth running.
Responsibilities.
- Implement a decision module that reads only
SearchTask,SearchHistory, and simple config (evidence thresholds, max SERP attempts). - Aggregate support and refute scores from
SearchHistory.evidenceusingEvidenceHit.weight(primary/wire/trade/other), and keep simple totals liketotalSupportandtotalRefute. A good starting scale is: primary domains = 1.0, wire services ≈ 0.8, trade/specialist press ≈ 0.6, and other/long-tail sources ≤ 0.4. - Use a documented threshold rule to decide when evidence is "enough": for example, treat the evidence as sufficient when either support or refute weight is ≥ 1.6 from at least two independent, time-aligned sources (mirroring the primary/wire weighting sketch in the spec).
- Choose one of three outcomes to start with: True, False, or Invalid. Use True when support clearly wins above threshold, False when refute clearly wins, and Invalid when we either never reach the threshold or the evidence is too conflicting/noisy by the time SERP attempts are exhausted.
- Expose a small decision result that either (a) contains a final
AgentResult(outcome + proof + sources) or (b) says "need more SERP search" so the loop can ask Serper for another query ifcurrent_serp_attempts < max_serp_attempts. - When producing a final
AgentResult, call a small LLM-backed proof writer that takesSearchTask+ the best EvidenceHits and returns a compact markdown proof, then map that into the sharedAgentResultshape from the Input issue (Input: Web‑search agent input layer #4).
The AgentResult contract is shared with the Input layer:
interface AgentResult {
outcome: string; // e.g. True | False | Invalid (initially)
proof: string; // short markdown summary + key snippets
sources: Array<{
url: string;
title: string;
pub_date: string | null;
excerpt: string;
}>;
debug?: {
total_queries?: number;
total_pages_visited?: number;
};
}TODO
- Implement an initial rule-based decision helper that uses
SearchHistoryaggregates (support/refute totals, number of independent sources, timeframe alignment) to decide when evidence is enough for True / False and when we should fall back to Invalid or request more SERP search. Start with a simple weighting scheme (primary = 1.0, wire ≈ 0.8, trade ≈ 0.6, other ≤ 0.4) and a threshold likesupport >= 1.6orrefute >= 1.6from ≥2 independent sources. - Define a decision boundary, e.g.
decideFromHistory(task, history, attempts, maxAttempts), that wraps this helper and returns either{ status: 'final', result: AgentResult }or{ status: 'need_more_search' }based only onSearchTask+SearchHistory+ simple counters. - Keep a single
decideFromHistoryentrypoint so Validator and external callers never depend on internalSearchHistorydetails; future changes to weights, thresholds, or additional outcomes should only require updating the internal rules, not the boundary itself.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request