fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path by mobilebarn · Pull Request #1068 · volcengine/OpenViking

mobilebarn · 2026-03-29T04:06:02Z

Problem

Under single-worker uvicorn, the OpenViking server becomes unresponsive (TCP accepts, HTTP never responds) within 10-40 minutes of normal operation. This happens when auto-recall search and auto-capture commit operations overlap.

Root Cause

Several synchronous blocking calls are made from inside async def handlers:

embedder.embed() in hierarchical_retriever.py — synchronous HTTP call to OpenAI embedding API
_adapter.query() in viking_vector_index_backend.py — synchronous storage query
rerank_batch() in hierarchical_retriever.py — synchronous HTTP call via requests.request()
agfs.stat/read in viking_fs.py — synchronous file I/O in abstract(), overview(), _read_relation_table()

Each call blocks the event loop for 100-500ms+. Under concurrent load, the health endpoint never gets a timeslot and the server appears hung.

Fix

Wrap all sync blocking calls in asyncio.to_thread() so they run in the default thread pool executor without blocking the event loop.

Testing

Server previously hung within 10-40 minutes under normal auto-recall + auto-capture load
With patches applied, server remains responsive under sustained load
Diagnostic identified by SENTINEL agent (Paperclip QA team) via systematic code-path audit

Files Changed

openviking/retrieve/hierarchical_retriever.py — embed + rerank → to_thread
openviking/storage/viking_vector_index_backend.py — query → to_thread
openviking/storage/viking_fs.py — agfs.stat/read → to_thread

…/recall path Under single-worker uvicorn, synchronous blocking calls in async handlers starve the event loop and cause the server to become unresponsive (TCP accepts but HTTP never responds). Changes: - retrieve/hierarchical_retriever.py: Wrap embedder.embed() and rerank_batch() in asyncio.to_thread(); convert _rerank_scores to async - storage/viking_vector_index_backend.py: Wrap _adapter.query() in asyncio.to_thread() - storage/viking_fs.py: Wrap agfs.stat/read calls in abstract(), overview(), and _read_relation_table() with asyncio.to_thread() These calls make synchronous HTTP requests (OpenAI embedding API), file I/O (AGFS), and database queries that block the event loop for 100-500ms+ per call. Under concurrent auto-recall + auto-capture load, this reliably deadlocks the server within 10-40 minutes. Tested: Server remains responsive under sustained auto-recall load with these patches applied (previously hung within 10-40 minutes). Co-Authored-By: Paperclip <noreply@paperclip.ing>

github-actions · 2026-03-29T04:06:47Z

Failed to generate code suggestions for PR

qin-ctx

Thanks for digging into this. I agree with the core diagnosis: the current async search path does call several synchronous network / filesystem / storage APIs directly, so moving those calls off the event loop is the right direction.

I found two blocking correctness issues in the current patch, though:

openviking/storage/viking_vector_index_backend.py now calls asyncio.to_thread(...), but the file still does not import asyncio.
openviking/storage/viking_fs.py now passes new_parent_uri=..., but update_uri_mapping() in openviking/storage/viking_vector_index_backend.py still does not accept that argument.

One more thing to revisit before calling the recall path fixed end-to-end: /search with a session_id still goes through Session.get_context_for_search(), and that path reads archive files via VikingFS.read_file(). read_file() still performs synchronous agfs.stat/read, so the session-backed recall path is not fully covered by this PR yet.

qin-ctx · 2026-03-30T06:52:08Z

openviking/storage/viking_vector_index_backend.py

                )

-            return self._adapter.query(
+            return await asyncio.to_thread(


[Bug] (blocking) This now calls asyncio.to_thread(...), but this file still does not import asyncio. The first query/search call on this branch will raise NameError: name asyncio is not defined, so the fix will fail before it can offload anything.

qin-ctx · 2026-03-30T06:52:08Z

openviking/storage/viking_fs.py

                    ctx=self._ctx_or_default(ctx),
                    uri=uri,
                    new_uri=new_uri,
+                    new_parent_uri=new_parent_uri,


[Bug] (blocking) update_uri_mapping() in openviking/storage/viking_vector_index_backend.py still has the signature (ctx, uri, new_uri, levels=None). Passing new_parent_uri= here will raise TypeError as soon as the mv/rename path hits this branch. If this parent-uri rewrite is needed, the callee and its tests need to be updated in the same PR; otherwise this hunk should be removed from the async-blocking fix.

github-project-automation bot added this to OpenViking project Mar 29, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 29, 2026

plhys mentioned this pull request Mar 29, 2026

[Bug] RocksDB lock contention: multiple _SingleAccountBackend instances open same DB path (local backend) #1072

Closed

Merge branch 'main' into fix/async-blocking-in-search-path

0742e3e

qin-ctx requested changes Mar 30, 2026

View reviewed changes

qin-ctx self-assigned this Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path#1068

fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path#1068
mobilebarn wants to merge 2 commits intovolcengine:mainfrom
mobilebarn:fix/async-blocking-in-search-path

mobilebarn commented Mar 29, 2026

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Mar 30, 2026

Uh oh!

qin-ctx Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mobilebarn commented Mar 29, 2026

Problem

Root Cause

Fix

Testing

Files Changed

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants