Skip to content

fix: grep false negatives, output mangling, and truncation annotations#791

Open
BadassBison wants to merge 1 commit intortk-ai:developfrom
BadassBison:fix/grep-false-negatives-and-truncation-annotations
Open

fix: grep false negatives, output mangling, and truncation annotations#791
BadassBison wants to merge 1 commit intortk-ai:developfrom
BadassBison:fix/grep-false-negatives-and-truncation-annotations

Conversation

@BadassBison
Copy link
Copy Markdown

@BadassBison BadassBison commented Mar 23, 2026

Summary

Fixes three issues where RTK's output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops, producing net-negative token impact during analysis-heavy workflows.

  • grep: add --no-ignore-vcs to rg — prevents false negatives in repos with .gitignore while still respecting .ignore and .rgignore
  • grep: passthrough for small results (<=50 matches) — preserves standard file:line:content format AI agents can parse
  • smart_truncate: clean truncation — removes synthetic // ... N lines omitted annotations that break AI parsing

Problem

Observed in a real session across a large Rails monorepo (~83K files, 1,633 RTK commands):

Issue Root Cause Impact
grep returns "0 matches" for existing files rg respects .gitignore by default, grep -r doesn't ~10 false negatives led to wrong analysis conclusions
grep output in "217 matches in 1F:" format Always reformatted, even for 4 matches AI agents can't parse it, retry 2-4 times each
// ... 81 lines omitted in file reads smart_truncate inserts synthetic comment markers AI treats annotations as code, retries with alternative commands

Quantified impact: grep had the lowest savings rate (9.3%) but the highest retry cost. Estimated 200-500K tokens burned on retries across ~15 retry patterns, each requiring 2-4 extra tool calls.

Changes

1. src/cmds/system/grep_cmd.rs--no-ignore-vcs flag

Added --no-ignore-vcs to the rg invocation so it doesn't skip files listed in .gitignore/.hgignore. This matches grep -r behavior and eliminates false negatives in repos where test files, build artifacts, or generated code live in gitignored directories. Using --no-ignore-vcs (not --no-ignore) so .ignore and .rgignore are still respected.

2. src/cmds/system/grep_cmd.rs — Passthrough for small results

Results with <=50 matches now output raw file:line:content format (standard grep output that AI agents already know how to parse). The grouped "X matches in Y files:" format is preserved only for >50 matches where token savings are meaningful.

3. src/core/filter.rs — Clean truncation in smart_truncate

Replaced the "smart" truncation logic that scattered " // ... N lines omitted" markers throughout file content with clean first-N-lines truncation. A single [X more lines] marker appears at the end only.

Tests

  • test_smart_truncate_no_annotations — verifies no // ... markers in output
  • test_smart_truncate_no_truncation_when_under_limit — no truncation when content fits
  • test_smart_truncate_exact_limit — edge case at exact line count
  • test_rg_no_ignore_vcs_flag_accepted — verifies rg accepts the new flag

Test plan

  • cargo fmt --all && cargo clippy --all-targets && cargo test --all
  • Manual: rtk grep "fn run" src/ with <=50 results outputs raw file:line:content format
  • Manual: rtk read src/main.rs --max-lines 5 shows clean truncation without // ... markers
  • Manual: verify grep finds files in .gitignored directories

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 23, 2026

CLA assistant check
All committers have signed the CLA.

@pszymkowiak pszymkowiak added bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal labels Mar 23, 2026
@pszymkowiak
Copy link
Copy Markdown
Collaborator

[w] wshm · Automated triage by AI

📊 Automated PR Analysis

🐛 Type bug-fix
🟡 Risk medium

Summary

Fixes three issues in grep and smart_truncate that caused AI agents to waste tokens on retry loops: adds --no-ignore to rg so gitignored files aren't silently skipped, passes through raw grep output for small result sets (<=50 matches) instead of a grouped format that confused AI parsers, and replaces synthetic '// ... N lines omitted' truncation markers with clean first-N-lines truncation plus a single '[X more lines]' suffix.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Analyzed automatically by wshm · This is an automated analysis, not a human review.

Copy link
Copy Markdown
Collaborator

@pszymkowiak pszymkowiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BadassBison — good analysis on grep false negatives and AI retry loops.

Please retarget to develop — all PRs must target develop, not master.

Review notes:

  1. --no-ignore is risky — this searches inside node_modules/, target/, etc. Consider --no-ignore-vcs instead (skips .gitignore but respects .ignore)
  2. Passthrough <=50 — interesting idea but the threshold should be configurable, and it changes RTK's savings metrics
  3. 10 README files — doc changes should be separate from the code fix

Please retarget and address the --no-ignore concern. Thanks!

@pszymkowiak
Copy link
Copy Markdown
Collaborator

Hi! Two things needed before we can review:

  1. Retarget to develop — this PR targets master, but all PRs should target develop. You can change the base branch in the PR settings (right sidebar).
  2. Sign the CLA — if not already done, please sign at https://cla-assistant.io/rtk-ai/rtk

Thanks!

@aeppling
Copy link
Copy Markdown
Contributor

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

@BadassBison BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from c6c979a to 2cc8a19 Compare March 26, 2026 21:52
@BadassBison BadassBison changed the base branch from master to develop March 26, 2026 21:52
@BadassBison BadassBison requested a review from pszymkowiak March 26, 2026 21:53
@BadassBison
Copy link
Copy Markdown
Author

@pszymkowiak @aeppling — addressed all feedback:

  1. Retargeted to develop — base branch updated.
  2. --no-ignore--no-ignore-vcs — switched to the more targeted flag that only disables VCS ignore files (.gitignore/.hgignore) while still respecting .ignore and .rgignore. Updated the corresponding test.
  3. Doc changes removed — the 10 README files have been dropped from this PR. The branch now contains only src/cmds/system/grep_cmd.rs and src/core/filter.rs.
  4. Rebased on develop — applied changes to the new file paths after the PR feat(refacto-codebase-onboarding): partie 1 - folders and technical docs #826 reorganization.
  5. CLA is signed — already confirmed by the CLA assistant bot above.

Thanks for the thorough review!

BadassBison added a commit to BadassBison/rtk that referenced this pull request Mar 26, 2026
Documents the changes from rtk-ai#791:
- grep now passes through raw output for <=50 matches (standard file:line:content)
- grep uses grouped format only for >50 matches where token savings are meaningful
- --no-ignore-vcs flag added to match grep -r behavior for .gitignore'd files
- savings range updated to 0-90% to reflect passthrough for small result sets
@nicklloyd
Copy link
Copy Markdown

also awaiting changes ;)

@BadassBison BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from baddd42 to 36041c5 Compare March 26, 2026 22:23
@BadassBison
Copy link
Copy Markdown
Author

@nicklloyd — all changes have been addressed! Retargeted to develop, switched to --no-ignore-vcs, doc updates moved to a separate PR (#871), rebased on the new src/ structure, and the cargo fmt CI failure is fixed. Should be good for another look. 🙏

@nicklloyd
Copy link
Copy Markdown

@BadassBison - just following from the sidelines as this one is a blocker. Looking forward to being able to try it out 🤘🏻

@BadassBison
Copy link
Copy Markdown
Author

BadassBison commented Mar 27, 2026

@pszymkowiak,
Everything is updated and awaiting review. Seems like this work is blocking others, anything else you need from me?

@aeppling aeppling self-assigned this Apr 2, 2026
@aeppling
Copy link
Copy Markdown
Contributor

aeppling commented Apr 2, 2026

Hello, this look fine but checks are not passing could you please check why ?

Maybe you're missing platforme tags to have clean checks for each platform

@pszymkowiak
Copy link
Copy Markdown
Collaborator

@BadassBison — reviewed the CI failure. Single test failing on all 3 platforms:

core::filter::tests::test_smart_truncate_overflow_count_exact
Could not parse overflow count from: [180 more lines]

The test parser looks for a word that parses as usize, but the new format [180 more lines] has [180 (with bracket prefix) which fails parse::<usize>().

Fix the parsing in the test to strip brackets:

let reported_more: usize = overflow_line
    .split_whitespace()
    .find_map(|w| w.trim_matches(|c: char| !c.is_ascii_digit()).parse().ok())
    .unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));

Or simpler — since you control the format, just parse directly:

// Parse "[N more lines]"
let reported_more: usize = overflow_line
    .trim()
    .strip_prefix('[')
    .and_then(|s| s.split_whitespace().next())
    .and_then(|n| n.parse().ok())
    .unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));

Once that's fixed, CI should go green. The rest of the PR looks good — --no-ignore-vcs, passthrough for small results, and clean truncation are all solid changes.

@BadassBison BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from 36041c5 to 1e4a14d Compare April 3, 2026 15:27
@BadassBison
Copy link
Copy Markdown
Author

Rebased on latest develop and fixed the failing test.

The test_smart_truncate_overflow_count_exact test was failing because the overflow format changed to [180 more lines] but the parser tried to parse [180 as usize (bracket prefix). Fixed the parser to strip the [ prefix before parsing:

// Parse "[N more lines]"
let reported_more: usize = overflow_line
    .trim()
    .strip_prefix('[')
    .and_then(|s| s.split_whitespace().next())
    .and_then(|n| n.parse().ok())
    .unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));

CI should go green now.

@pszymkowiak
Copy link
Copy Markdown
Collaborator

The --no-ignore-vcs fix is good — it solves a real problem (false negatives in repos with .gitignore'd files).

However the passthrough approach for <=50 results goes against RTK's design: RTK is a proxy — we always filter and compress output to save tokens. Passthrough = 0% savings = no reason for RTK to exist on that path.

The right fix for grep is to filter better, not to stop filtering:

Same concern for smart_truncate: removing the // ... markers is correct (they broke AI parsing), but replacing the smart logic with a simple first-N-lines truncation loses the value of keeping function signatures and imports visible.

Core principle: RTK is a proxy — we filter and compress output, but we never disable filtering or pass through raw output. See CONTRIBUTING.md for design philosophy.

- grep: use --no-ignore-vcs so .gitignore'd files aren't silently skipped
  (matches grep -r behavior, avoids false negatives in large monorepos)
- grep: passthrough raw output for <=50 matches so AI agents can parse
  standard file:line:content format without retry loops
- filter: replace smart_truncate heuristic with clean first-N-lines truncation
  and a single [X more lines] suffix (eliminates synthetic // ... markers
  that AI agents misread as code, causing parsing confusion and retries)
@BadassBison BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from 1e4a14d to 8813737 Compare April 14, 2026 13:51
@BadassBison
Copy link
Copy Markdown
Author

@pszymkowiak — addressed both concerns from the Apr 12 review. Rebased on latest develop.

grep: removed passthrough, unified to file:line:content format

Dropped the <= 50 passthrough / > 50 grouped split entirely. There is now a single output path that always filters:

  • Lines truncated via clean_line() (context-aware, pattern-centered)
  • Per-file cap (25) and global cap (200) from config applied to all result sets
  • Output format: file:line:content on every line — AI-parseable and standard
  • Header: N matches in M files: at top; [+N more] at end if capped
  • --no-ignore-vcs flag unchanged

smart_truncate: restored smart selection, removed inline markers

Brought back the priority logic (FUNC_SIGNATURE, IMPORT_PATTERN, pub, export, {, }) — the window stays useful for code files. Removed all // ... N lines omitted markers (they were the root cause of AI confusion). Replaced with a single [N more lines] at the end only.

Invariant preserved: kept_count + N == total_lines

Docs PR (#871)

Also rebased and updated: descriptions updated to reflect filter-always behavior, SYSTEM savings range restored to 50-90%, passthrough language removed.

@aeppling
Copy link
Copy Markdown
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-changes bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants