Skip to content

Fix search performance regression and TUI race condition#256

Open
jesserobbins wants to merge 4 commits intowesm:mainfrom
jesserobbins:fix-search-perf-and-race
Open

Fix search performance regression and TUI race condition#256
jesserobbins wants to merge 4 commits intowesm:mainfrom
jesserobbins:fix-search-perf-and-race

Conversation

@jesserobbins
Copy link
Copy Markdown
Contributor

Tested and ran into a bunch of performance & context issues that surfaced in the TUI. Original approach using ILIKE is better it seems like for large archives.

Changes

  • Revert DuckDB fast search from regexp_matches back to ILIKE — on large Parquet archives the regex-per-row evaluation caused TUI fast search to hang. ILIKE is natively optimized by DuckDB for columnar Parquet scans. FTS5 deep search path is unchanged.
  • Fix TUI search race condition — when starting an inline search, invalidate in-flight loadMessages responses (from sort toggle or "all messages" view) so they can't overwrite search results after the search completes.
  • Clear m.messages immediately on search start — prevents stale pre-search messages from showing during the async search transition.
  • Remove dead codeescapeRegex and startsWithWordChar are no longer used.
  • Add regression tests — 4 new tests covering the race condition, stale-message clearing, and ILIKE path verification for both fast and aggregate search.

🤖 Generated with Claude Code

jesserobbins and others added 3 commits April 10, 2026 11:33
The regexp_matches approach caused significant performance regression
in TUI fast search (Parquet scans). ILIKE is natively optimized by
DuckDB for Parquet columnar scans. FTS5 deep search path remains
unchanged for body text accuracy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When starting an inline search, increment loadRequestID to prevent
in-flight loadMessages responses from overwriting search results.
Previously, pressing (a) to view all messages then quickly searching
could cause the stale loadMessages response to replace search results
with unfiltered messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TestStaleLoadMessagesDoesNotOverwriteSearch: verifies that in-flight
  loadMessages responses (from sort toggle or "all messages" view) are
  ignored after a search starts, preventing stale results from
  overwriting search results
- TestSearchClearsStaleMessages: verifies messages are cleared
  immediately when search starts so stale results never display
- TestBuildSearchConditions_UsesILIKENotRegex: verifies the fast search
  path uses ILIKE (fast on Parquet) instead of regexp_matches (slow)
- TestBuildAggregateSearchConditions_UsesILIKENotRegex: same check for
  the aggregate search path
- Remove dead code: escapeRegex and startsWithWordChar (no longer used
  after reverting to ILIKE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (d3811db)

Medium issue found; security review and one default review found no additional issues.

Medium

  • internal/query/duckdb.go:447 - buildAggregateSearchConditions now emits 4 arguments per text term with no key columns, but countArgsForTextTerms still skips only 3. buildStatsSearchConditions uses that helper when a stats search has both text terms and non-text filters, so it can append leftover text-term args without matching placeholders and fail aggregate stats queries.

    Suggested fix: Update countArgsForTextTerms to return n * 4 and add a regression test for a stats/aggregate query combining a text term with a non-text operator such as from:.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@jesserobbins
Copy link
Copy Markdown
Contributor Author

Missed a chance to make a joke that ILIKE Ducks or something but will keep the PR

@jesserobbins
Copy link
Copy Markdown
Contributor Author

roborev: Combined Review (d3811db)

Medium issue found; security review and one default review found no additional issues.

Medium

  • internal/query/duckdb.go:447 - buildAggregateSearchConditions now emits 4 arguments per text term with no key columns, but countArgsForTextTerms still skips only 3. buildStatsSearchConditions uses that helper when a stats search has both text terms and non-text filters, so it can append leftover text-term args without matching placeholders and fail aggregate stats queries.

    Suggested fix: Update countArgsForTextTerms to return n * 4 and add a regression test for a stats/aggregate query combining a text term with a non-text operator such as from:.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Interesting will look and fix

buildStatsSearchConditions used to call buildAggregateSearchConditions
and slice off the text-term prefix using a hand-tracked
countArgsForTextTerms helper. Any drift between the helper and the
actual per-term arg count (subject + snippet + 2 sender = 4) would
cause aggregate stats queries to fail with unmatched placeholders when
a search mixed text terms with non-text operators like from:.

Extract the non-text filter portion of buildAggregateSearchConditions
into a new buildNonTextSearchConditions helper and call it directly
from buildStatsSearchConditions, eliminating the arg-slicing workaround
and the brittle countArgsForTextTerms constant entirely.

Add regression tests that lock the invariant in place by counting "?"
placeholders in the emitted WHERE conditions and comparing to len(args)
across a matrix of query shapes (text only, text+from, text+to+subject,
label view, date/size filters, etc.) for both builders.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (bf776fb)

Verdict: Changes are mostly sound, but there are 2 medium-severity behavioral regressions that should be addressed before merge.

Medium

  • Transient false "No results found" state
    Location: internal/tui/model.go:1190
    Clearing m.messages as soon as a debounced search starts creates a misleading empty state. The list view interprets len(m.messages) == 0 && !m.loading as "No results found", but this path only sets inlineSearchLoading, not loading, so users can briefly see a false zero-results message while the search is still in flight.
    Suggested fix: Keep the previous rows visible until new results arrive, or suppress the empty-state message while inlineSearchLoading is true.

  • Search semantics broadened from word-start matching to arbitrary substring matching
    Location: internal/query/duckdb.go:438, internal/query/duckdb.go:2373
    Replacing the previous word-boundary regex search with %term% ILIKE materially changes free-text matching behavior. Queries that previously matched term starts now also match arbitrary substrings inside words, names, domains, and phone numbers, which is likely to produce noisier results. Current tests only validate SQL shape and escaping, so this behavior change is not covered as an intentional regression.
    Suggested fix: Preserve the prior word-start semantics, or add explicit behavioral tests and documentation if the broader substring matching is intentional.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@jesserobbins
Copy link
Copy Markdown
Contributor Author

jesserobbins commented Apr 10, 2026

Hi! I disagree with these recommendations which is why I am proposing reverting to the prior search behavior in this PR.

  1. A user initiating a search expects to see only the results of that search. Retaining and presenting cached and irrelevant search is confusing and counterproductive. Additionally the TUI is inconsistent in how it updates and displays this as results come in. The TUI should display the search response or nothing.

  2. I agree that the regex based deep search is better. However it is unacceptably slower in my own testing.

So recommend proceeding with the current solution, and in future make TUI search flows use same path as the search function at the cli.

Thanks!

@jesserobbins jesserobbins marked this pull request as ready for review April 10, 2026 21:16
@jesserobbins
Copy link
Copy Markdown
Contributor Author

(Also @wesm roborev is pretty cool!)

jesserobbins pushed a commit to jesserobbins/msgvault that referenced this pull request Apr 10, 2026
Temporary tracking branch for testing the deduplicate feature together
with PR wesm#256 (search perf + race fix). Delete once wesm#256 is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants