fix: grep false negatives, output mangling, and truncation annotations by BadassBison · Pull Request #791 · rtk-ai/rtk

BadassBison · 2026-03-23T15:50:47Z

Summary

Fixes three issues where RTK's output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops, producing net-negative token impact during analysis-heavy workflows.

grep: add --no-ignore-vcs to rg — prevents false negatives in repos with .gitignore while still respecting .ignore and .rgignore
grep: passthrough for small results (<=50 matches) — preserves standard file:line:content format AI agents can parse
smart_truncate: clean truncation — removes synthetic // ... N lines omitted annotations that break AI parsing

Problem

Observed in a real session across a large Rails monorepo (~83K files, 1,633 RTK commands):

Issue	Root Cause	Impact
grep returns "0 matches" for existing files	`rg` respects `.gitignore` by default, `grep -r` doesn't	~10 false negatives led to wrong analysis conclusions
grep output in `"217 matches in 1F:"` format	Always reformatted, even for 4 matches	AI agents can't parse it, retry 2-4 times each
`// ... 81 lines omitted` in file reads	`smart_truncate` inserts synthetic comment markers	AI treats annotations as code, retries with alternative commands

Quantified impact: grep had the lowest savings rate (9.3%) but the highest retry cost. Estimated 200-500K tokens burned on retries across ~15 retry patterns, each requiring 2-4 extra tool calls.

Changes

1. `src/cmds/system/grep_cmd.rs` — `--no-ignore-vcs` flag

Added --no-ignore-vcs to the rg invocation so it doesn't skip files listed in .gitignore/.hgignore. This matches grep -r behavior and eliminates false negatives in repos where test files, build artifacts, or generated code live in gitignored directories. Using --no-ignore-vcs (not --no-ignore) so .ignore and .rgignore are still respected.

2. `src/cmds/system/grep_cmd.rs` — Passthrough for small results

Results with <=50 matches now output raw file:line:content format (standard grep output that AI agents already know how to parse). The grouped "X matches in Y files:" format is preserved only for >50 matches where token savings are meaningful.

3. `src/core/filter.rs` — Clean truncation in `smart_truncate`

Replaced the "smart" truncation logic that scattered " // ... N lines omitted" markers throughout file content with clean first-N-lines truncation. A single [X more lines] marker appears at the end only.

Tests

test_smart_truncate_no_annotations — verifies no // ... markers in output
test_smart_truncate_no_truncation_when_under_limit — no truncation when content fits
test_smart_truncate_exact_limit — edge case at exact line count
test_rg_no_ignore_vcs_flag_accepted — verifies rg accepts the new flag

Test plan

cargo fmt --all && cargo clippy --all-targets && cargo test --all
Manual: rtk grep "fn run" src/ with <=50 results outputs raw file:line:content format
Manual: rtk read src/main.rs --max-lines 5 shows clean truncation without // ... markers
Manual: verify grep finds files in .gitignored directories

CLAassistant · 2026-03-23T15:50:55Z

All committers have signed the CLA.

pszymkowiak · 2026-03-23T15:51:10Z

[w] wshm · Automated triage by AI

📊 Automated PR Analysis


🐛 Type	`bug-fix`
🟡 Risk	`medium`

Summary

Fixes three issues in grep and smart_truncate that caused AI agents to waste tokens on retry loops: adds --no-ignore to rg so gitignored files aren't silently skipped, passes through raw grep output for small result sets (<=50 matches) instead of a grouped format that confused AI parsers, and replaces synthetic '// ... N lines omitted' truncation markers with clean first-N-lines truncation plus a single '[X more lines]' suffix.

Review Checklist

Tests present
Breaking change
Docs updated

Analyzed automatically by wshm · This is an automated analysis, not a human review.

pszymkowiak

Thanks @BadassBison — good analysis on grep false negatives and AI retry loops.

Please retarget to develop — all PRs must target develop, not master.

Review notes:

--no-ignore is risky — this searches inside node_modules/, target/, etc. Consider --no-ignore-vcs instead (skips .gitignore but respects .ignore)
Passthrough <=50 — interesting idea but the threshold should be configurable, and it changes RTK's savings metrics
10 README files — doc changes should be separate from the code fix

Please retarget and address the --no-ignore concern. Thanks!

pszymkowiak · 2026-03-26T10:26:50Z

Hi! Two things needed before we can review:

Retarget to develop — this PR targets master, but all PRs should target develop. You can change the base branch in the PR settings (right sidebar).
Sign the CLA — if not already done, please sign at https://cla-assistant.io/rtk-ai/rtk

Thanks!

aeppling · 2026-03-26T18:39:13Z

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

BadassBison · 2026-03-26T21:55:11Z

@pszymkowiak @aeppling — addressed all feedback:

Retargeted to develop — base branch updated.
--no-ignore → --no-ignore-vcs — switched to the more targeted flag that only disables VCS ignore files (.gitignore/.hgignore) while still respecting .ignore and .rgignore. Updated the corresponding test.
Doc changes removed — the 10 README files have been dropped from this PR. The branch now contains only src/cmds/system/grep_cmd.rs and src/core/filter.rs.
Rebased on develop — applied changes to the new file paths after the PR feat(refacto-codebase-onboarding): partie 1 - folders and technical docs #826 reorganization.
CLA is signed — already confirmed by the CLA assistant bot above.

Thanks for the thorough review!

Documents the changes from rtk-ai#791: - grep now passes through raw output for <=50 matches (standard file:line:content) - grep uses grouped format only for >50 matches where token savings are meaningful - --no-ignore-vcs flag added to match grep -r behavior for .gitignore'd files - savings range updated to 0-90% to reflect passthrough for small result sets

nicklloyd · 2026-03-26T22:13:02Z

also awaiting changes ;)

BadassBison · 2026-03-26T22:25:27Z

@nicklloyd — all changes have been addressed! Retargeted to develop, switched to --no-ignore-vcs, doc updates moved to a separate PR (#871), rebased on the new src/ structure, and the cargo fmt CI failure is fixed. Should be good for another look. 🙏

nicklloyd · 2026-03-26T22:41:25Z

@BadassBison - just following from the sidelines as this one is a blocker. Looking forward to being able to try it out 🤘🏻

BadassBison · 2026-03-27T12:10:01Z

@pszymkowiak,
Everything is updated and awaiting review. Seems like this work is blocking others, anything else you need from me?

aeppling · 2026-04-02T09:53:41Z

Hello, this look fine but checks are not passing could you please check why ?

Maybe you're missing platforme tags to have clean checks for each platform

pszymkowiak · 2026-04-02T20:21:27Z

@BadassBison — reviewed the CI failure. Single test failing on all 3 platforms:

core::filter::tests::test_smart_truncate_overflow_count_exact
Could not parse overflow count from: [180 more lines]

The test parser looks for a word that parses as usize, but the new format [180 more lines] has [180 (with bracket prefix) which fails parse::<usize>().

Fix the parsing in the test to strip brackets:

let reported_more: usize = overflow_line
    .split_whitespace()
    .find_map(|w| w.trim_matches(|c: char| !c.is_ascii_digit()).parse().ok())
    .unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));

Or simpler — since you control the format, just parse directly:

// Parse "[N more lines]"
let reported_more: usize = overflow_line
    .trim()
    .strip_prefix('[')
    .and_then(|s| s.split_whitespace().next())
    .and_then(|n| n.parse().ok())
    .unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));

Once that's fixed, CI should go green. The rest of the PR looks good — --no-ignore-vcs, passthrough for small results, and clean truncation are all solid changes.

BadassBison · 2026-04-03T15:28:31Z

Rebased on latest develop and fixed the failing test.

The test_smart_truncate_overflow_count_exact test was failing because the overflow format changed to [180 more lines] but the parser tried to parse [180 as usize (bracket prefix). Fixed the parser to strip the [ prefix before parsing:

// Parse "[N more lines]"
let reported_more: usize = overflow_line
    .trim()
    .strip_prefix('[')
    .and_then(|s| s.split_whitespace().next())
    .and_then(|n| n.parse().ok())
    .unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));

CI should go green now.

pszymkowiak · 2026-04-12T12:17:47Z

The --no-ignore-vcs fix is good — it solves a real problem (false negatives in repos with .gitignore'd files).

However the passthrough approach for <=50 results goes against RTK's design: RTK is a proxy — we always filter and compress output to save tokens. Passthrough = 0% savings = no reason for RTK to exist on that path.

The right fix for grep is to filter better, not to stop filtering:

Keep the file:line:content format (AI-parseable) but still deduplicate, truncate long lines, and cap results
The caps raised in fix: raise output caps for P0 bugs (#617, #618, #620) #630 (10→25 per file, 50→200 global) were the right approach

Same concern for smart_truncate: removing the // ... markers is correct (they broke AI parsing), but replacing the smart logic with a simple first-N-lines truncation loses the value of keeping function signatures and imports visible.

Core principle: RTK is a proxy — we filter and compress output, but we never disable filtering or pass through raw output. See CONTRIBUTING.md for design philosophy.

- grep: use --no-ignore-vcs so .gitignore'd files aren't silently skipped (matches grep -r behavior, avoids false negatives in large monorepos) - grep: passthrough raw output for <=50 matches so AI agents can parse standard file:line:content format without retry loops - filter: replace smart_truncate heuristic with clean first-N-lines truncation and a single [X more lines] suffix (eliminates synthetic // ... markers that AI agents misread as code, causing parsing confusion and retries)

BadassBison · 2026-04-14T13:55:50Z

@pszymkowiak — addressed both concerns from the Apr 12 review. Rebased on latest develop.

grep: removed passthrough, unified to `file:line:content` format

Dropped the <= 50 passthrough / > 50 grouped split entirely. There is now a single output path that always filters:

Lines truncated via clean_line() (context-aware, pattern-centered)
Per-file cap (25) and global cap (200) from config applied to all result sets
Output format: file:line:content on every line — AI-parseable and standard
Header: N matches in M files: at top; [+N more] at end if capped
--no-ignore-vcs flag unchanged

smart_truncate: restored smart selection, removed inline markers

Brought back the priority logic (FUNC_SIGNATURE, IMPORT_PATTERN, pub, export, {, }) — the window stays useful for code files. Removed all // ... N lines omitted markers (they were the root cause of AI confusion). Replaced with a single [N more lines] at the end only.

Invariant preserved: kept_count + N == total_lines

Docs PR (#871)

Also rebased and updated: descriptions updated to reflect filter-always behavior, SYSTEM savings range restored to 50-90%, passthrough language removed.

aeppling · 2026-04-14T18:13:46Z

LGTM

pszymkowiak added bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal labels Mar 23, 2026

BadassBison mentioned this pull request Mar 25, 2026

Output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops #831

Open

pszymkowiak requested changes Mar 26, 2026

View reviewed changes

pszymkowiak added the awaiting-changes label Mar 26, 2026

BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from c6c979a to 2cc8a19 Compare March 26, 2026 21:52

BadassBison changed the base branch from master to develop March 26, 2026 21:52

BadassBison requested a review from pszymkowiak March 26, 2026 21:53

BadassBison mentioned this pull request Mar 26, 2026

docs: update grep descriptions for passthrough behavior #871

Open

BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from baddd42 to 36041c5 Compare March 26, 2026 22:23

aeppling self-assigned this Apr 2, 2026

BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from 36041c5 to 1e4a14d Compare April 3, 2026 15:27

BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from 1e4a14d to 8813737 Compare April 14, 2026 13:51

aeppling approved these changes Apr 14, 2026

View reviewed changes

Conversation

BadassBison commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

1. src/cmds/system/grep_cmd.rs — --no-ignore-vcs flag

2. src/cmds/system/grep_cmd.rs — Passthrough for small results

3. src/core/filter.rs — Clean truncation in smart_truncate

Tests

Test plan

Uh oh!

CLAassistant commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pszymkowiak commented Mar 23, 2026

📊 Automated PR Analysis

Summary

Review Checklist

Uh oh!

pszymkowiak left a comment

Choose a reason for hiding this comment

Uh oh!

pszymkowiak commented Mar 26, 2026

Uh oh!

aeppling commented Mar 26, 2026

What you need to do

Uh oh!

BadassBison commented Mar 26, 2026

Uh oh!

nicklloyd commented Mar 26, 2026

Uh oh!

BadassBison commented Mar 26, 2026

Uh oh!

nicklloyd commented Mar 26, 2026

Uh oh!

BadassBison commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeppling commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pszymkowiak commented Apr 2, 2026

Uh oh!

BadassBison commented Apr 3, 2026

Uh oh!

pszymkowiak commented Apr 12, 2026

Uh oh!

BadassBison commented Apr 14, 2026

grep: removed passthrough, unified to file:line:content format

smart_truncate: restored smart selection, removed inline markers

Docs PR (#871)

Uh oh!

aeppling commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

BadassBison commented Mar 23, 2026 •

edited

Loading

1. `src/cmds/system/grep_cmd.rs` — `--no-ignore-vcs` flag

2. `src/cmds/system/grep_cmd.rs` — Passthrough for small results

3. `src/core/filter.rs` — Clean truncation in `smart_truncate`

CLAassistant commented Mar 23, 2026 •

edited

Loading

BadassBison commented Mar 27, 2026 •

edited

Loading

aeppling commented Apr 2, 2026 •

edited

Loading

grep: removed passthrough, unified to `file:line:content` format