Skip to content

Conversation

@pierceboggan
Copy link
Contributor

Adds real-time quality analysis for prompt files — like a linter, but for prompt engineering. Catches common authoring mistakes and helps write more effective AI instructions directly in the editor.

image

Features

  • Instruction strength analysis — Flags weak language ("try to", "consider", "might") and suggests stronger alternatives ("Must", "Always", "Never")
  • Ambiguity detection — Catches vague quantifiers ("several", "a few") and unresolved positional references ("as mentioned above")
  • Structure linting — Detects mixed XML/Markdown conventions and mismatched XML tags
  • Redundancy detection — Finds duplicate instructions and constraints that subsume each other (e.g., "Never X" + "Avoid X")
  • Variable validation — Flags empty {{}} placeholders and undefined template variables
  • Token budget awareness — Warns when prompts are too large or contain content that tokenizes inefficiently
  • Example sufficiency — Suggests adding few-shot examples when output format is specified without them
  • LLM-powered semantic analysis — Uses Copilot to detect contradictions, persona inconsistencies, cognitive load issues, and coverage gaps

How it works

Two analysis layers run in parallel:

  1. Static analysis (debounced 400ms) — fast, free, deterministic checks on every keystroke
  2. LLM analysis (debounced 3s) — sends content to Copilot for semantic checks like contradiction detection and coverage gaps

Results surface through standard editor UX:

  • Diagnostics in the Problems panel with severity-appropriate markers
  • Code Lenses showing issue count + estimated token usage
  • Hovers showing instruction-strength classification on keywords
  • Quick Fix code actions that replace weak phrasing in one click

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds prompt-file “quality” language features (diagnostics, hovers, code actions, and code lenses) via a static analyzer plus an optional LLM-backed analyzer.

Changes:

  • Introduces a PromptStaticQualityAnalyzer with unit tests for common prompt-authoring issues.
  • Adds a per-model contribution that runs debounced static + optional LLM analysis and publishes diagnostics.
  • Registers new prompt-quality hover, code actions, and code lens UI, plus an experimental config flag to enable LLM analysis.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/vs/workbench/contrib/chat/test/common/promptSyntax/languageProviders/promptStaticQualityAnalyzer.test.ts Adds unit tests covering static analyzer diagnostics and ranges/severity.
src/vs/workbench/contrib/chat/common/promptSyntax/promptFileContributions.ts Registers prompt-quality providers/contribution with prompt language features.
src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptStaticQualityAnalyzer.ts Implements static, deterministic prompt-quality checks and emits markers.
src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptQualityHoverProvider.ts Adds hover UX for variables and instruction-strength keywords.
src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptQualityContribution.ts Tracks prompt models and runs debounced static + LLM analysis to set markers.
src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptQualityConstants.ts Centralizes shared patterns/constants for analyzers and UI providers.
src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptQualityCodeActionProvider.ts Adds quick fixes driven by structured marker codes.
src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptLlmQualityAnalyzer.ts Adds optional Copilot-powered semantic analysis and converts results to markers.
src/vs/workbench/contrib/chat/browser/promptSyntax/promptQualityCodeLensProvider.ts Adds code lenses summarizing issue count and estimating token usage per section.
src/vs/workbench/contrib/chat/browser/chat.contribution.ts Wires in the code lens provider and adds config for enabling LLM analysis.
Comments suppressed due to low confidence (1)

src/vs/workbench/contrib/chat/common/promptSyntax/languageProviders/promptLlmQualityAnalyzer.ts:1

  • Several LLM-derived markers are anchored to line 1, which will be incorrect when prompt files have frontmatter/headers and bodyStartLine is not 1. Use bodyStartLine (and ideally a best-effort line match) for these markers so diagnostics point into the prompt body instead of the file header.
/*---------------------------------------------------------------------------------------------

const instructionRegex = /\b(?:must|should|always|never|avoid|do not|don't)\s+([^.!?]+)/gi;

for (let i = 0; i < lines.length; i++) {
const line = lines[i];
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instructionRegex is global (/g) and reused across lines without resetting lastIndex, which can cause matches at the beginning of a new line to be skipped. Reset instructionRegex.lastIndex = 0 at the start of each per-line loop, or construct a new RegExp per line / use line.matchAll(...) to avoid stateful regex behavior.

Suggested change
const line = lines[i];
const line = lines[i];
instructionRegex.lastIndex = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +176
const llmMarkers: IMarkerData[] = [];
const bodyStart = this.getBodyStartLine();
await this.llmAnalyzer.analyze(this.textModel, bodyStart, cts.token, m => llmMarkers.push(m));

if (!cts.token.isCancellationRequested) {
this.markerService.changeOne(
QUALITY_MARKERS_OWNER_ID,
this.textModel.uri,
[...this.lastStaticMarkers, ...llmMarkers],
);
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge uses this.lastStaticMarkers at completion time, which may correspond to a different model version than the content analyzed by the LLM (especially if content changed during the LLM request). Consider snapshotting const versionId = textModel.getVersionId() and the static markers at request time, and only applying the merged result if the model version still matches (or if you intentionally want “latest static + previous LLM”, document that behavior).

Copilot uses AI. Check for mistakes.
const end = start + strengthMatch[0].length;
if (position.column >= start && position.column <= end) {
const content = new MarkdownString();
content.appendMarkdown(`**${localize('promptQualityHover.instructionStrength', "Instruction Strength")}:** ${strength}\n\n${getStrengthDescription(strength)}`);
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructs a user-visible string via a template literal that mixes localized and non-localized parts. For localization robustness (and to follow VS Code string externalization conventions), localize the full hover text with placeholders (including the strength label/value and description) rather than assembling it with string interpolation.

Suggested change
content.appendMarkdown(`**${localize('promptQualityHover.instructionStrength', "Instruction Strength")}:** ${strength}\n\n${getStrengthDescription(strength)}`);
content.appendMarkdown(localize(
'promptQualityHover.instructionStrengthHover',
"**Instruction Strength:** {0}\n\n{1}",
strength,
getStrengthDescription(strength),
));

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +58
command: {
title: issueCount === 0
? localize('promptQualityCodeLens.noIssues', "Prompt Quality: No issues found")
: localize('promptQualityCodeLens.issues', "Prompt Quality: {0} issue(s) found", issueCount),
id: issueCount > 0 ? 'workbench.actions.view.problems' : '',
},
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command.id should be a valid command identifier when a command is provided. Setting id: '' risks creating an invalid command entry; prefer omitting the command entirely for the “no issues” lens, or use a real no-op command if the UI requires a command object.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +92
for (let i = 1; i <= lineCount; i++) {
const lineContent = model.getLineContent(i);
const headerMatch = lineContent.match(/^(#{1,6})\s+(.+)$/);
if (headerMatch) {
const sectionName = headerMatch[2];
// Find the end of this section (next header or end of file)
let sectionEnd = lineCount;
for (let j = i + 1; j <= lineCount; j++) {
if (/^#{1,6}\s+/.test(model.getLineContent(j))) {
sectionEnd = j - 1;
break;
}
}
// Estimate tokens for this section
let sectionChars = 0;
for (let j = i; j <= sectionEnd; j++) {
sectionChars += model.getLineContent(j).length + 1; // +1 for newline
}
const sectionTokens = Math.ceil(sectionChars / CHARS_PER_TOKEN);

lenses.push({
range: { startLineNumber: i, startColumn: 1, endLineNumber: i, endColumn: 1 },
command: {
title: localize('promptQualityCodeLens.sectionTokens', "\u00A7 {0} \u2014 ~{1} tokens", sectionName, sectionTokens),
id: '',
},
});
}
}

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-section token calculation is O(n²) in the number of lines (each header scans forward to find the next header and then loops again to count chars). For larger prompt files this can become noticeable. Consider a single pass that records header line indices first (or tracks current section start/end as you scan) and accumulates char counts without nested loops.

Suggested change
for (let i = 1; i <= lineCount; i++) {
const lineContent = model.getLineContent(i);
const headerMatch = lineContent.match(/^(#{1,6})\s+(.+)$/);
if (headerMatch) {
const sectionName = headerMatch[2];
// Find the end of this section (next header or end of file)
let sectionEnd = lineCount;
for (let j = i + 1; j <= lineCount; j++) {
if (/^#{1,6}\s+/.test(model.getLineContent(j))) {
sectionEnd = j - 1;
break;
}
}
// Estimate tokens for this section
let sectionChars = 0;
for (let j = i; j <= sectionEnd; j++) {
sectionChars += model.getLineContent(j).length + 1; // +1 for newline
}
const sectionTokens = Math.ceil(sectionChars / CHARS_PER_TOKEN);
lenses.push({
range: { startLineNumber: i, startColumn: 1, endLineNumber: i, endColumn: 1 },
command: {
title: localize('promptQualityCodeLens.sectionTokens', "\u00A7 {0} \u2014 ~{1} tokens", sectionName, sectionTokens),
id: '',
},
});
}
}
type SectionInfo = { line: number; name: string; chars: number };
const sections: SectionInfo[] = [];
let currentSection: SectionInfo | undefined;
for (let i = 1; i <= lineCount; i++) {
const lineContent = model.getLineContent(i);
const headerMatch = lineContent.match(/^(#{1,6})\s+(.+)$/);
if (headerMatch) {
const sectionName = headerMatch[2];
currentSection = { line: i, name: sectionName, chars: 0 };
sections.push(currentSection);
}
if (currentSection) {
currentSection.chars += lineContent.length + 1; // +1 for newline
}
}
for (const section of sections) {
const sectionTokens = Math.ceil(section.chars / CHARS_PER_TOKEN);
lenses.push({
range: { startLineNumber: section.line, startColumn: 1, endLineNumber: section.line, endColumn: 1 },
command: {
title: localize('promptQualityCodeLens.sectionTokens', "\u00A7 {0} \u2014 ~{1} tokens", section.name, sectionTokens),
id: '',
},
});
}

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +47
async provideCodeActions(model: ITextModel, range: Range | Selection, context: CodeActionContext, _token: CancellationToken): Promise<CodeActionList | undefined> {
const markers = this.markerService.read({
resource: model.uri,
owner: QUALITY_MARKERS_OWNER_ID,
});
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider ignores context (e.g., context.only / requested kinds) and re-reads markers from the marker service every time. Prefer using context to filter early (and, if available in this API layer, context.markers instead of a separate marker-service read) to reduce work and align with how code action invocation scopes requests.

Copilot uses AI. Check for mistakes.
@blizzy78
Copy link

Should add don't -> do not as well. Gemini tells me the contraction is weaker, whereas do not has more authority.

@taylorarndt
Copy link

This looks really useful! Having prompt quality analysis right in the editor is something I've been wanting. I'd love to try a sample or preview of this if there's a way to test it out.
Quick question though - what model is actually powering the LLM analysis? The PR just says "Copilot" but I'm wondering if there's visibility into which model is being used, or if there's any plan to let users choose? Different models have pretty different strengths for this kind of analysis.
Also curious about availability - will this work in the CLI version of VS Code, or is it editor-only? And does the LLM analysis layer count against Copilot quota or require a premium subscription? Just trying to understand the usage model.
This brings up something I've been thinking about too - different models have really different conventions for what makes a "good" prompt. Like Claude works best with XML-structured prompts and explicit instructions, GPT-5 has its own patterns, Gemini prefers different things. Right now the linter seems to assume one set of best practices, but what works for one model might not be optimal for another. For example, flagging missing XML tags would be super helpful for Claude prompts but not really relevant if you're writing for GPT-5.
Any thoughts on making this model-aware? Maybe through a setting where you specify your target model and the analysis adjusts accordingly? Or even a plugin system where people could add model-specific analyzers?
The separation of static vs LLM analysis is really smart, and I could see this evolving into something more sophisticated where different specialized models handle different aspects of quality. Would be amazing to see this become a cross-editor standard for prompt engineering if the architecture can support that kind of extensibility.
Excited to see where this goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants