Skip to content

Feature Request: Structured Data Retrieval During Prompt Execution #188

@Alan-Jowett

Description

@Alan-Jowett

Feature Request: Structured Data Retrieval During Prompt Execution

Summary

Currently PromptKit allows the LLM a wide degree of freedom in how it retrieves additional data during prompt execution and reasoning.

In practice this means the model is implicitly deciding:

  • what additional context to retrieve
  • when to retrieve it
  • from where to retrieve it
  • how much of it to retrieve
  • and how to incorporate it into the reasoning chain

While this flexibility is useful for exploratory tasks, it leads to prompt executions that are:

  • non-deterministic
  • difficult to audit
  • difficult to reproduce
  • difficult to reason about post-hoc
  • sensitive to model implementation details

This becomes particularly problematic for engineering workflows (requirements extraction, protocol validation, maintenance audits, corpus sanitation, etc.) where prompt execution needs to be explainable and repeatable.


Problem Statement

At present, data retrieval during prompt execution is effectively:

model-directed rather than workflow-directed

The LLM is free to:

  • implicitly expand context
  • decide when additional data is required
  • select sources from repository state or prior context
  • retrieve information in a non-observable way
  • alter retrieval strategy across executions

As a result:

Property Current Behavior
Retrieval intent Implicit
Retrieval timing Model-decided
Retrieval scope Model-decided
Observability Low
Determinism Low
Auditability Low
Reproducibility Low

This introduces "vibey" prompt execution where reasoning outcomes may change without any change to:

  • the prompt
  • the repository state
  • the workflow definition

but only due to differences in retrieval decisions made by the model.


Proposed Direction

PromptKit should introduce structured data retrieval as a first-class execution phase within prompt workflows.

Instead of allowing the LLM to freely pull context during reasoning, workflows should be able to:

  1. Declare retrieval requirements up-front
  2. Specify retrieval sources explicitly
  3. Define retrieval scope deterministically
  4. Bind retrieved context into the execution pipeline prior to reasoning

Conceptually:
Current:
LLM
└── reasoning
└── implicit retrieval (model decides)
Proposed:
Workflow
├── Retrieval Phase (declared)
├── Context Binding
└── Reasoning Phase (bounded)


Functional Requirements

PromptKit SHOULD allow workflow authors to:

  • Declare required data inputs as part of workflow definition
  • Specify retrieval source(s) (e.g. repo files, prompt corpus, protocol docs)
  • Define retrieval scope (paths, tags, taxonomy classes, etc.)
  • Execute retrieval prior to reasoning phase
  • Bind retrieved context into composed prompt deterministically
  • Prevent unstructured context expansion during reasoning phase

Non-Goals

This proposal does NOT aim to:

  • eliminate model-driven reasoning
  • eliminate dynamic workflow selection
  • prevent exploratory execution modes

Instead it aims to:

separate data acquisition from reasoning

so that prompt execution becomes:

  • reproducible
  • inspectable
  • explainable
  • engineering-grade

Example Use Cases

This would improve determinism for:

  • corpus audit workflows
  • maintenance prompts
  • protocol validation
  • taxonomy sanitation
  • requirements extraction
  • template consistency analysis

where consistent outcomes across executions are required.


Motivation

PromptKit positions prompts as structured engineering artifacts.

Allowing implicit, model-directed data retrieval during execution undermines:

  • workflow predictability
  • execution traceability
  • CI-driven prompt validation
  • long-term maintenance workflows

Structured retrieval would make prompt execution behave more like:

a deterministic pipeline
rather than
a best-effort interactive reasoning session


Open Questions

  • Should retrieval policies be declared at:
    • persona level?
    • protocol level?
    • workflow level?
  • Should workflows be able to restrict additional retrieval during reasoning?
  • Should retrieval plans be surfaced in execution logs?

Acceptance Criteria

A PromptKit workflow can:

  • declare required retrieval inputs
  • execute retrieval deterministically prior to reasoning
  • bind retrieved context into prompt assembly
  • produce consistent execution results across runs given identical inputs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions