Feature Request: Structured Data Retrieval During Prompt Execution

## Feature Request: Structured Data Retrieval During Prompt Execution

### Summary

Currently PromptKit allows the LLM a wide degree of freedom in how it retrieves additional data during prompt execution and reasoning.

In practice this means the model is implicitly deciding:

- what additional context to retrieve
- when to retrieve it
- from where to retrieve it
- how much of it to retrieve
- and how to incorporate it into the reasoning chain

While this flexibility is useful for exploratory tasks, it leads to prompt executions that are:

- non-deterministic
- difficult to audit
- difficult to reproduce
- difficult to reason about post-hoc
- sensitive to model implementation details

This becomes particularly problematic for engineering workflows (requirements extraction, protocol validation, maintenance audits, corpus sanitation, etc.) where prompt execution needs to be explainable and repeatable.

---

### Problem Statement

At present, data retrieval during prompt execution is effectively:

> model-directed rather than workflow-directed

The LLM is free to:

- implicitly expand context
- decide when additional data is required
- select sources from repository state or prior context
- retrieve information in a non-observable way
- alter retrieval strategy across executions

As a result:

| Property            | Current Behavior |
|---------------------|------------------|
| Retrieval intent    | Implicit         |
| Retrieval timing    | Model-decided    |
| Retrieval scope     | Model-decided    |
| Observability       | Low              |
| Determinism         | Low              |
| Auditability        | Low              |
| Reproducibility     | Low              |

This introduces "vibey" prompt execution where reasoning outcomes may change without any change to:

- the prompt
- the repository state
- the workflow definition

but only due to differences in retrieval decisions made by the model.

---

### Proposed Direction

PromptKit should introduce **structured data retrieval as a first-class execution phase** within prompt workflows.

Instead of allowing the LLM to freely pull context during reasoning, workflows should be able to:

1. Declare retrieval requirements up-front
2. Specify retrieval sources explicitly
3. Define retrieval scope deterministically
4. Bind retrieved context into the execution pipeline prior to reasoning

Conceptually:
Current:
LLM
└── reasoning
└── implicit retrieval (model decides)
Proposed:
Workflow
├── Retrieval Phase (declared)
├── Context Binding
└── Reasoning Phase (bounded)


---

### Functional Requirements

PromptKit SHOULD allow workflow authors to:

- Declare required data inputs as part of workflow definition
- Specify retrieval source(s) (e.g. repo files, prompt corpus, protocol docs)
- Define retrieval scope (paths, tags, taxonomy classes, etc.)
- Execute retrieval prior to reasoning phase
- Bind retrieved context into composed prompt deterministically
- Prevent unstructured context expansion during reasoning phase

---

### Non-Goals

This proposal does NOT aim to:

- eliminate model-driven reasoning
- eliminate dynamic workflow selection
- prevent exploratory execution modes

Instead it aims to:

> separate data acquisition from reasoning

so that prompt execution becomes:

- reproducible
- inspectable
- explainable
- engineering-grade

---

### Example Use Cases

This would improve determinism for:

- corpus audit workflows
- maintenance prompts
- protocol validation
- taxonomy sanitation
- requirements extraction
- template consistency analysis

where consistent outcomes across executions are required.

---

### Motivation

PromptKit positions prompts as structured engineering artifacts.

Allowing implicit, model-directed data retrieval during execution undermines:

- workflow predictability
- execution traceability
- CI-driven prompt validation
- long-term maintenance workflows

Structured retrieval would make prompt execution behave more like:

> a deterministic pipeline  
> rather than  
> a best-effort interactive reasoning session

---

### Open Questions

- Should retrieval policies be declared at:
  - persona level?
  - protocol level?
  - workflow level?
- Should workflows be able to restrict additional retrieval during reasoning?
- Should retrieval plans be surfaced in execution logs?

---

### Acceptance Criteria

A PromptKit workflow can:

- declare required retrieval inputs
- execute retrieval deterministically prior to reasoning
- bind retrieved context into prompt assembly
- produce consistent execution results across runs given identical inputs



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Structured Data Retrieval During Prompt Execution #188