Pre-submission checklist | 提交前检查
Bug Description | 问题描述
The search pipeline creates a new ContextThreadPoolExecutor on every request in four methods within searcher.py:
_retrieve_paths (line 357, max_workers=5)
_retrieve_from_long_term_and_user (line 638, max_workers=3)
_retrieve_from_tool_memory (line 791, max_workers=2)
_deduplicate_rawfile_results (line 1153, up to max_workers=10)
Each executor is used within a with block, which calls shutdown(wait=True) on exit. If any submitted task hangs — e.g., a slow Neo4j query, an unresponsive embedding API, or a network timeout — the executor never shuts down. The threads remain alive, and the next request creates another pool.
Over time, this causes unbounded thread accumulation. In our deployment we observed 8,744 threads in the memos-api container, at which point:
/search returns HTTP 200 with empty results (the can't start new thread error is caught silently)
/chat returns HTTP 503 (it calls search internally but doesn't handle the thread error gracefully)
- Even
docker exec fails — OpenBLAS cannot create pthreads
Note: Adding timeout to future.result() alone does not fix this. The timeout only skips waiting for the result — the thread itself keeps running, and shutdown(wait=True) still blocks until it finishes. The threads still accumulate.
How to Reproduce | 如何重现
- Deploy MemOS with Neo4j backend in Docker
- Send sustained
/search traffic over hours/days
- If any downstream dependency (Neo4j, embedding API) experiences intermittent slowness, threads accumulate
- Monitor with:
docker exec <container> cat /proc/1/status | grep Threads
Environment | 环境信息
- Python version: 3.11
- Operating System: Linux (Raspberry Pi / aarch64)
- MemOS version: v2.0.8 (also present in v2.0.9 —
searcher.py unchanged)
- Backend: Neo4j
- Deployment: Docker
Additional Context | 其他信息
Suggested Fix:
Use a shared, class-level ContextThreadPoolExecutor instead of creating a new one per request. The Searcher class already follows this pattern with _usage_executor (line 73). Adding a second shared pool for search operations would bound thread count regardless of request volume or downstream latency:
# In __init__:
self._search_executor = ContextThreadPoolExecutor(max_workers=10, thread_name_prefix="search")
# In each method, replace:
# with ContextThreadPoolExecutor(max_workers=N) as executor:
# with:
# executor = self._search_executor
All future.result() calls should also include a timeout (e.g., 30s) as a safety measure.
Workarounds:
- Set
pids_limit in docker-compose.yml to fail fast instead of consuming all system resources
- Set
OPENBLAS_NUM_THREADS=1 to reduce per-thread overhead from numpy
Willingness to Implement | 实现意愿
Pre-submission checklist | 提交前检查
Bug Description | 问题描述
The search pipeline creates a new
ContextThreadPoolExecutoron every request in four methods withinsearcher.py:_retrieve_paths(line 357,max_workers=5)_retrieve_from_long_term_and_user(line 638,max_workers=3)_retrieve_from_tool_memory(line 791,max_workers=2)_deduplicate_rawfile_results(line 1153, up tomax_workers=10)Each executor is used within a
withblock, which callsshutdown(wait=True)on exit. If any submitted task hangs — e.g., a slow Neo4j query, an unresponsive embedding API, or a network timeout — the executor never shuts down. The threads remain alive, and the next request creates another pool.Over time, this causes unbounded thread accumulation. In our deployment we observed 8,744 threads in the
memos-apicontainer, at which point:/searchreturns HTTP 200 with empty results (thecan't start new threaderror is caught silently)/chatreturns HTTP 503 (it calls search internally but doesn't handle the thread error gracefully)docker execfails — OpenBLAS cannot create pthreadsNote: Adding
timeouttofuture.result()alone does not fix this. The timeout only skips waiting for the result — the thread itself keeps running, andshutdown(wait=True)still blocks until it finishes. The threads still accumulate.How to Reproduce | 如何重现
/searchtraffic over hours/daysdocker exec <container> cat /proc/1/status | grep ThreadsEnvironment | 环境信息
searcher.pyunchanged)Additional Context | 其他信息
Suggested Fix:
Use a shared, class-level
ContextThreadPoolExecutorinstead of creating a new one per request. TheSearcherclass already follows this pattern with_usage_executor(line 73). Adding a second shared pool for search operations would bound thread count regardless of request volume or downstream latency:All
future.result()calls should also include a timeout (e.g., 30s) as a safety measure.Workarounds:
pids_limitindocker-compose.ymlto fail fast instead of consuming all system resourcesOPENBLAS_NUM_THREADS=1to reduce per-thread overhead from numpyWillingness to Implement | 实现意愿