feat(chunker): add .svelte support with two-phase TypeScript injection#128
feat(chunker): add .svelte support with two-phase TypeScript injection#128AutumnsGrove wants to merge 4 commits intoory:mainfrom
Conversation
Index .svelte files by parsing the outer Svelte grammar to locate
<script> blocks, then re-parsing each block's raw_text with the
TypeScript chunker to extract named symbols (functions, classes,
interfaces, etc.). Line numbers are adjusted to be file-relative so
search results point to the correct lines in the original .svelte file.
Template syntax ({#if}, {#each}, bind:) and Svelte rune calls
($state(), $derived()) are handled transparently — runes parse as
ordinary TypeScript call-expression initializers.
Also registers text-embedding-voyage-4-nano in KnownModels (LM Studio,
1024 dims, 2048 ctx).
Fixes three hook tests that failed when a user config file at
~/.config/lumen/config.yaml set a non-default embedding model: the
tests now set XDG_CONFIG_HOME to a temp dir so both writeHookTestDB
and the hook use the same default model, preventing DB-path hash
mismatches and dimension-mismatch schema resets.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I see this has a new chunker for svelte - in that case I think we need a bench-swe suite that has an appropriate test. It sometimes takes a few attempts to find a good test case, because not all test cases can be solved well by claude. If lumen is not performing, then there can be multiple reasons:
You can use claude to analyze the raw json files. It takes a few attempts to get this working well, but once it does, it actually helps a lot! |
|
@aeneasr I can do that. I just added the basic svelte support - I wasn't aware of your swe bench suites. I'll take a look at it and add to this PR. The issue I experienced is svelte wasn't indexed at all - this Pr attempts to implement that. When I tested it locally via my built mcp server, it properly picked up the svelte files and worked flawlessly. I tried to build on your existing parsing tooling. |
This model is LM Studio-only (served via Voyage AI's local inference) and not available in Ollama. The project defaults to Ollama and runs e2e tests against it, so registering an LM Studio-exclusive model here is misleading. Users who want voyage-4-nano can still configure it manually via LUMEN_EMBED_MODEL — it just won't have pre-registered dims/ctx/min-score. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Vendor 9 real-world .svelte components from huggingface/chat-ui (Apache 2.0, commit c0cfbdf) into testdata/fixtures/svelte/ - Add testdata/sample-project/Dashboard.svelte as the E2E fixture component with ActivityCache class, loadUserActivity, and handleRefresh symbols for semantic search verification - Add TestE2E_SvelteIndexing: asserts .svelte files are indexed, file-relative line numbers are correct, and symbols from the script block surface in semantic_search results - Update all hardcoded file-count assertions (5 → 6, and 6 → 7 for the incremental test) to account for Dashboard.svelte - Isolate E2E subprocess config via XDG_CONFIG_HOME so tests are hermetic regardless of local ~/.config/lumen/config.yaml - Add all-minilm model preflight check in TestMain with a clear error message if the model is not installed in Ollama Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hand-authored Dashboard.svelte with mcp-server-card.svelte from huggingface/chat-ui (already vendored in testdata/fixtures/svelte/). All sample project files now originate from established open-source repos. Update TestE2E_SvelteIndexing to search for symbols from the real file (setEnabled, handleHealthCheck, handleDelete). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@aeneasr
One thing to flag — TestLang_Python/HTTP_route_handler_decorator is failing locally with a snapshot drift |
|
7979d7f is passing so the snapshot drift should be from your work, not broken on master The Kotlin PR is pretty messy still. |
|
Could you please run the benchmark also? :) So we know if Lumen is properly indexing svelte! And commit the result benchmark (like we have for the other languages) |
Closes #126
Summary
.sveltetosupportedExtensionsandDefaultLanguagesso Svelte files are no longer silently skipped by the merkle walker. On large SvelteKit monorepos this can recover 40%+ of previously invisible source files.SvelteChunker(newinternal/chunker/svelte.go) using a two-phase injection pattern: the outer tree-sitter Svelte grammar locates<script>elements; each block'sraw_textis re-parsed with the existing TypeScriptTreeSitterChunkerto extract named symbols. Line numbers are adjusted to be file-relative so search results point to the correct lines.$state(),$derived()) parse cleanly as TypeScript call-expression initializers — no special handling needed.text-embedding-voyage-4-nanoinKnownModels(LM Studio backend, 1024 dims, 2048 ctx).cmdhook tests that failed when~/.config/lumen/config.yamlset a non-default model: addsXDG_CONFIG_HOMEisolation so bothwriteHookTestDBand the hook under test resolve the same default model, preventing DB-path hash mismatches and dimension-mismatch schema resets that silently wipedlast_indexed_at.Files changed
internal/chunker/svelte.goSvelteChunkerwith two-phase parseinternal/chunker/svelte_test.gointernal/chunker/languages.go.svelteinsupportedExtensions+DefaultLanguagesinternal/chunker/treesitter_test.go.sveltefixture totrivialSourcesinternal/models/models.gotext-embedding-voyage-4-nanotoKnownModelsinternal/models/models_test.gocmd/hook_test.goXDG_CONFIG_HOMEin three tests to isolate from user configgo.mod/go.sumgithub.com/alexaandru/go-sitter-forest/svelte v1.9.2Test plan
go test ./...— all 12 packages passTestSvelteChunker_ScriptSymbols— verifies function/interface/class extraction and file-relative line numbersTestSvelteChunker_EmptyScript/TestSvelteChunker_NoScript— edge cases return zero chunks without errorTestDefaultLanguages_AllExtensionsPresent—.sveltefixture added totrivialSourcesXDG_CONFIG_HOMEisolation🤖 Generated with Claude Code