Add a rate limiter for litellm by faysou · Pull Request #125 · cocoindex-io/cocoindex-code

faysou · 2026-03-25T21:55:06Z

Summary

Adds PacedLiteLLMEmbedder to serialize LiteLLM embedding requests, apply configurable minimum request spacing, and retry rate-limited calls with bounded backoff.
Extends user embedding settings with min_interval_ms and persists that field through settings serialization and deserialization.
Wires LiteLLM embedder creation to use a default 5ms pacing interval when the setting is omitted, and documents the new configuration in the user-facing markdown references.
Adds direct unit coverage for the new LiteLLM retry and pacing paths using mocked litellm.aembedding calls.

Validation

uv run pytest tests/test_litellm_embedder.py -v passed with 2 passed in 1.60s.
uv run pytest tests/ -v passed with 108 passed in 159.00s.
uv run pre-commit run --all-files passed.

georgeh0

Thanks for the PR!

There's a bunch of CI errors because of mypy. Please fix and also run a uv run mypy . to check.

georgeh0 · 2026-04-05T17:19:28Z

src/cocoindex_code/litellm_embedder.py

+class _PacedEmbedderInstance:
+    """Inner batched embedder for a specific input_type."""
+
+    def __init__(self, embedder: PacedLiteLLMEmbedder, input_type: str | None) -> None:
+        self._embedder = embedder
+        self._input_type = input_type
+
+    @coco.fn.as_async(batching=True, max_batch_size=64)
+    async def embed(self, texts: list[str]) -> list[NDArray[np.float32]]:
+        kwargs = dict(self._embedder._kwargs)
+        if self._input_type is not None:
+            kwargs["input_type"] = self._input_type
+        response = await self._embedder.run_embedding_request(input=texts, **kwargs)
+        return [np.array(item["embedding"], dtype=np.float32) for item in response.data]


We won't need a separate instance. We can just add additional argument like input_type directly to the PacedLiteLLMEmbedder.embed() method like this:

https://github.com/cocoindex-io/cocoindex/blob/310045e05d97829bfb54bc881ba794fc9b0990cf/python/cocoindex/ops/sentence_transformers.py#L94-L107

thank you, this should now be fixed, including for the pre-commit

faysou mentioned this pull request Apr 5, 2026

Cloud model and rate limit #123

Closed

georgeh0 reviewed Apr 5, 2026

View reviewed changes

faysou force-pushed the feature/configurable-litellm-pacing branch from 1cca4c3 to 724afe2 Compare April 5, 2026 17:22

Add a rate limiter for litellm

4959ed0

faysou force-pushed the feature/configurable-litellm-pacing branch from 724afe2 to 4959ed0 Compare April 5, 2026 17:28

georgeh0 merged commit 34565b4 into cocoindex-io:main Apr 6, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a rate limiter for litellm#125

Add a rate limiter for litellm#125
georgeh0 merged 1 commit intococoindex-io:mainfrom
faysou:feature/configurable-litellm-pacing

faysou commented Mar 25, 2026 •

edited

Loading

Uh oh!

georgeh0 left a comment

Uh oh!

georgeh0 Apr 5, 2026

Uh oh!

faysou Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

faysou commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

georgeh0 left a comment

Choose a reason for hiding this comment

Uh oh!

georgeh0 Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

faysou Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

faysou commented Mar 25, 2026 •

edited

Loading