Skip to content

feat(chunker): add .svelte support with two-phase TypeScript injection#128

Open
AutumnsGrove wants to merge 4 commits intoory:mainfrom
AutumnsGrove:feat/svelte-support
Open

feat(chunker): add .svelte support with two-phase TypeScript injection#128
AutumnsGrove wants to merge 4 commits intoory:mainfrom
AutumnsGrove:feat/svelte-support

Conversation

@AutumnsGrove
Copy link
Copy Markdown
Contributor

Closes #126

Summary

  • Adds .svelte to supportedExtensions and DefaultLanguages so Svelte files are no longer silently skipped by the merkle walker. On large SvelteKit monorepos this can recover 40%+ of previously invisible source files.
  • Implements SvelteChunker (new internal/chunker/svelte.go) using a two-phase injection pattern: the outer tree-sitter Svelte grammar locates <script> elements; each block's raw_text is re-parsed with the existing TypeScript TreeSitterChunker to extract named symbols. Line numbers are adjusted to be file-relative so search results point to the correct lines.
  • Svelte runes ($state(), $derived()) parse cleanly as TypeScript call-expression initializers — no special handling needed.
  • Registers text-embedding-voyage-4-nano in KnownModels (LM Studio backend, 1024 dims, 2048 ctx).
  • Fixes three cmd hook tests that failed when ~/.config/lumen/config.yaml set a non-default model: adds XDG_CONFIG_HOME isolation so both writeHookTestDB and the hook under test resolve the same default model, preventing DB-path hash mismatches and dimension-mismatch schema resets that silently wiped last_indexed_at.

Files changed

File Change
internal/chunker/svelte.go New — SvelteChunker with two-phase parse
internal/chunker/svelte_test.go New — script symbol extraction, empty script, no-script cases
internal/chunker/languages.go Register .svelte in supportedExtensions + DefaultLanguages
internal/chunker/treesitter_test.go Add .svelte fixture to trivialSources
internal/models/models.go Add text-embedding-voyage-4-nano to KnownModels
internal/models/models_test.go Update expected count + add voyage-4-nano entry
cmd/hook_test.go Set XDG_CONFIG_HOME in three tests to isolate from user config
go.mod / go.sum Add github.com/alexaandru/go-sitter-forest/svelte v1.9.2

Test plan

  • go test ./... — all 12 packages pass
  • TestSvelteChunker_ScriptSymbols — verifies function/interface/class extraction and file-relative line numbers
  • TestSvelteChunker_EmptyScript / TestSvelteChunker_NoScript — edge cases return zero chunks without error
  • TestDefaultLanguages_AllExtensionsPresent.svelte fixture added to trivialSources
  • All three previously-failing hook tests now pass with XDG_CONFIG_HOME isolation

🤖 Generated with Claude Code

Index .svelte files by parsing the outer Svelte grammar to locate
<script> blocks, then re-parsing each block's raw_text with the
TypeScript chunker to extract named symbols (functions, classes,
interfaces, etc.). Line numbers are adjusted to be file-relative so
search results point to the correct lines in the original .svelte file.

Template syntax ({#if}, {#each}, bind:) and Svelte rune calls
($state(), $derived()) are handled transparently — runes parse as
ordinary TypeScript call-expression initializers.

Also registers text-embedding-voyage-4-nano in KnownModels (LM Studio,
1024 dims, 2048 ctx).

Fixes three hook tests that failed when a user config file at
~/.config/lumen/config.yaml set a non-default embedding model: the
tests now set XDG_CONFIG_HOME to a temp dir so both writeHookTestDB
and the hook use the same default model, preventing DB-path hash
mismatches and dimension-mismatch schema resets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 13, 2026

I see this has a new chunker for svelte - in that case I think we need a bench-swe suite that has an appropriate test. It sometimes takes a few attempts to find a good test case, because not all test cases can be solved well by claude. If lumen is not performing, then there can be multiple reasons:

  • the LLM can one shot the problem and does not need a lot of tool calls
  • the issue references the problem directly (verbatim code) because then obviously grep is faster
  • the chunker or tree sitter is broken for svelte

You can use claude to analyze the raw json files. It takes a few attempts to get this working well, but once it does, it actually helps a lot!

@AutumnsGrove
Copy link
Copy Markdown
Contributor Author

AutumnsGrove commented Apr 13, 2026

@aeneasr I can do that. I just added the basic svelte support - I wasn't aware of your swe bench suites. I'll take a look at it and add to this PR.

The issue I experienced is svelte wasn't indexed at all - this Pr attempts to implement that. When I tested it locally via my built mcp server, it properly picked up the svelte files and worked flawlessly. I tried to build on your existing parsing tooling.

AutumnsGrove and others added 3 commits April 13, 2026 18:47
This model is LM Studio-only (served via Voyage AI's local inference) and
not available in Ollama. The project defaults to Ollama and runs e2e tests
against it, so registering an LM Studio-exclusive model here is misleading.
Users who want voyage-4-nano can still configure it manually via
LUMEN_EMBED_MODEL — it just won't have pre-registered dims/ctx/min-score.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Vendor 9 real-world .svelte components from huggingface/chat-ui
  (Apache 2.0, commit c0cfbdf) into testdata/fixtures/svelte/
- Add testdata/sample-project/Dashboard.svelte as the E2E fixture
  component with ActivityCache class, loadUserActivity, and
  handleRefresh symbols for semantic search verification
- Add TestE2E_SvelteIndexing: asserts .svelte files are indexed,
  file-relative line numbers are correct, and symbols from the
  script block surface in semantic_search results
- Update all hardcoded file-count assertions (5 → 6, and 6 → 7
  for the incremental test) to account for Dashboard.svelte
- Isolate E2E subprocess config via XDG_CONFIG_HOME so tests are
  hermetic regardless of local ~/.config/lumen/config.yaml
- Add all-minilm model preflight check in TestMain with a clear
  error message if the model is not installed in Ollama

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hand-authored Dashboard.svelte with mcp-server-card.svelte
from huggingface/chat-ui (already vendored in testdata/fixtures/svelte/).
All sample project files now originate from established open-source repos.

Update TestE2E_SvelteIndexing to search for symbols from the real file
(setEnabled, handleHealthCheck, handleDelete).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AutumnsGrove
Copy link
Copy Markdown
Contributor Author

@aeneasr
Updated the branch with the testing you mentioned. Here's what was added:

  • Svelte fixture files — vendored 9 real .svelte components from huggingface/chat-ui (Apache 2.0) into
    testdata/fixtures/svelte/, same pattern as the other languages
  • E2E test — TestE2E_SvelteIndexing runs the full MCP pipeline against a real chat-ui component
    (mcp-server-card.svelte), asserts it gets indexed, symbols surface in semantic_search results, and line numbers
    are file-relative (not script-block-relative)
  • E2E hermetic config — added XDG_CONFIG_HOME isolation to the test subprocess so local
    ~/.config/lumen/config.yaml can't bleed into test runs and cause model mismatches
  • all-minilm preflight — TestMain now checks the model is actually installed in Ollama and exits with a clear
    message if not, rather than failing deep in sqlite-vec with a dimension mismatch error

One thing to flag — TestLang_Python/HTTP_route_handler_decorator is failing locally with a snapshot drift
(check+RoutePattern kind flipping between type and function). Confirmed it's pre-existing on this branch before
any of my changes. Looks like it may be related to the broader snapshot regeneration happening in #116?

@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 15, 2026

7979d7f is passing so the snapshot drift should be from your work, not broken on master

The Kotlin PR is pretty messy still.

@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 15, 2026

Could you please run the benchmark also? :) So we know if Lumen is properly indexing svelte! And commit the result benchmark (like we have for the other languages)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for svelte

2 participants