Skip to content

Add a rate limiter for litellm#125

Merged
georgeh0 merged 1 commit intococoindex-io:mainfrom
faysou:feature/configurable-litellm-pacing
Apr 6, 2026
Merged

Add a rate limiter for litellm#125
georgeh0 merged 1 commit intococoindex-io:mainfrom
faysou:feature/configurable-litellm-pacing

Conversation

@faysou
Copy link
Copy Markdown
Contributor

@faysou faysou commented Mar 25, 2026

Summary

  • Adds PacedLiteLLMEmbedder to serialize LiteLLM embedding requests, apply configurable minimum request spacing, and retry rate-limited calls with bounded backoff.
  • Extends user embedding settings with min_interval_ms and persists that field through settings serialization and deserialization.
  • Wires LiteLLM embedder creation to use a default 5ms pacing interval when the setting is omitted, and documents the new configuration in the user-facing markdown references.
  • Adds direct unit coverage for the new LiteLLM retry and pacing paths using mocked litellm.aembedding calls.

Validation

  • uv run pytest tests/test_litellm_embedder.py -v passed with 2 passed in 1.60s.
  • uv run pytest tests/ -v passed with 108 passed in 159.00s.
  • uv run pre-commit run --all-files passed.

@faysou faysou mentioned this pull request Apr 5, 2026
Copy link
Copy Markdown
Member

@georgeh0 georgeh0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

There's a bunch of CI errors because of mypy. Please fix and also run a uv run mypy . to check.

Comment on lines +38 to +51
class _PacedEmbedderInstance:
"""Inner batched embedder for a specific input_type."""

def __init__(self, embedder: PacedLiteLLMEmbedder, input_type: str | None) -> None:
self._embedder = embedder
self._input_type = input_type

@coco.fn.as_async(batching=True, max_batch_size=64)
async def embed(self, texts: list[str]) -> list[NDArray[np.float32]]:
kwargs = dict(self._embedder._kwargs)
if self._input_type is not None:
kwargs["input_type"] = self._input_type
response = await self._embedder.run_embedding_request(input=texts, **kwargs)
return [np.array(item["embedding"], dtype=np.float32) for item in response.data]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't need a separate instance. We can just add additional argument like input_type directly to the PacedLiteLLMEmbedder.embed() method like this:

https://github.com/cocoindex-io/cocoindex/blob/310045e05d97829bfb54bc881ba794fc9b0990cf/python/cocoindex/ops/sentence_transformers.py#L94-L107

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, this should now be fixed, including for the pre-commit

@faysou faysou force-pushed the feature/configurable-litellm-pacing branch from 1cca4c3 to 724afe2 Compare April 5, 2026 17:22
@faysou faysou force-pushed the feature/configurable-litellm-pacing branch from 724afe2 to 4959ed0 Compare April 5, 2026 17:28
@georgeh0 georgeh0 merged commit 34565b4 into cocoindex-io:main Apr 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants