-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat(metrics): add comprehensive STT token usage tracking #4542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds STT token-usage fields and threads token usage from STT plugins (OpenAI) through SpeechEvent -> STTMetrics -> UsageCollector -> UsageSummary; also schedules Azure recognition-usage emission and tweaks FakeAudioOutput playback-duration computation. Changes
Sequence Diagram(s)sequenceDiagram
participant OpenAI as OpenAI API
participant OpenAIPlugin as OpenAI STT Plugin
participant STT as STT Engine
participant Collector as UsageCollector
participant Summary as UsageSummary
OpenAI->>OpenAIPlugin: response (transcript + usage)
OpenAIPlugin->>OpenAIPlugin: extract token_usage (input/output/total/audio/text)
OpenAIPlugin->>STT: emit SpeechEvent (with token_usage)
STT->>STT: construct STTMetrics (include token fields)
STT->>Collector: emit STTMetrics
Collector->>Collector: aggregate stt_*_tokens
Collector->>Summary: update UsageSummary token fields
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/metrics/base.py`:
- Around line 41-53: Remove the trailing whitespace on the empty line preceding
the token fields to satisfy the linter, and delete the temporary review comment
"# NEW: Token usage fields" (or replace it with a concise docstring/header)
since each field already has a docstring; update the block with the attributes
input_tokens, output_tokens, total_tokens, audio_tokens, and text_tokens in
livekit.agents.metrics.base (the variables named input_tokens, output_tokens,
total_tokens, audio_tokens, text_tokens) so only the documented fields remain
and no trailing spaces exist.
In `@livekit-agents/livekit/agents/metrics/usage_collector.py`:
- Around line 25-32: Remove the trailing whitespace on the blank line following
the stt_text_tokens field in the UsageCollector dataclass (the lines defining
stt_input_tokens, stt_output_tokens, stt_total_tokens, stt_audio_tokens,
stt_text_tokens); edit the file to delete the trailing space characters at the
end of that line (or remove the empty line entirely) and re-run the linter to
ensure W293 is resolved.
In `@livekit-agents/livekit/agents/stt/stt.py`:
- Around line 166-182: The blank lines surrounding the token-extraction block
contain trailing whitespace; remove trailing spaces on the empty lines around
the code that handles event._token_usage (the block that sets input_tokens,
output_tokens, total_tokens, audio_tokens, text_tokens) so there are truly blank
lines without trailing whitespace and ruff W293 is resolved.
In `@livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py`:
- Around line 439-471: Remove the trailing whitespace on the blank line (fix the
ruff lint) and stop assigning a dynamic attribute `_token_usage` to SpeechEvent;
instead add an optional, typed field to the SpeechEvent dataclass (e.g.,
token_usage: Optional[RecognitionUsage] or a new dataclass with
input_tokens/output_tokens/total_tokens/audio_tokens/text_tokens) or extend the
existing RecognitionUsage type to include audio_tokens/text_tokens, then set
that typed field when constructing stt.SpeechEvent (the constructed symbol is
stt.SpeechEvent with type stt.SpeechEventType.FINAL_TRANSCRIPT and alternatives
[sd]) so mypy strict mode no longer reports attr-defined errors.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
livekit-agents/livekit/agents/metrics/base.pylivekit-agents/livekit/agents/metrics/usage_collector.pylivekit-agents/livekit/agents/stt/stt.pylivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/stt/stt.pylivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.pylivekit-agents/livekit/agents/metrics/usage_collector.pylivekit-agents/livekit/agents/metrics/base.py
🧬 Code graph analysis (3)
livekit-agents/livekit/agents/stt/stt.py (1)
livekit-agents/livekit/agents/metrics/base.py (1)
STTMetrics(30-54)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py (2)
livekit-agents/livekit/agents/voice/agent_activity.py (1)
stt(2773-2774)livekit-agents/livekit/agents/voice/agent.py (1)
stt(508-518)
livekit-agents/livekit/agents/metrics/usage_collector.py (1)
livekit-agents/livekit/agents/telemetry/http_server.py (1)
metrics(18-35)
🪛 GitHub Check: ruff
livekit-agents/livekit/agents/stt/stt.py
[failure] 182-182: Ruff (W293)
livekit-agents/livekit/agents/stt/stt.py:182:1: W293 Blank line contains whitespace
[failure] 173-173: Ruff (W293)
livekit-agents/livekit/agents/stt/stt.py:173:1: W293 Blank line contains whitespace
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py
[failure] 450-450: Ruff (W293)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py:450:1: W293 Blank line contains whitespace
livekit-agents/livekit/agents/metrics/usage_collector.py
[failure] 32-32: Ruff (W293)
livekit-agents/livekit/agents/metrics/usage_collector.py:32:1: W293 Blank line contains whitespace
livekit-agents/livekit/agents/metrics/base.py
[failure] 41-41: Ruff (W293)
livekit-agents/livekit/agents/metrics/base.py:41:1: W293 Blank line contains whitespace
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.13)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.9)
🔇 Additional comments (5)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py (1)
458-461: LGTM - SpeechEvent creation logic is correct.The event creation preserves the existing behavior for FINAL_TRANSCRIPT events while adding token usage metadata.
livekit-agents/livekit/agents/stt/stt.py (2)
183-199: LGTM - STTMetrics construction correctly includes token fields.The token fields are properly extracted and passed to the
STTMetricsconstructor, maintaining consistency with the field definitions inbase.py.
390-402: Streaming metrics don't include token usage.The
_metrics_monitor_taskcreatesSTTMetricsfor streaming recognition without extracting token usage. While the fields default to 0, this creates an inconsistency where batchrecognize()reports tokens but streaming doesn't.If the realtime API doesn't provide token data, this is expected behavior. Otherwise, consider extracting token usage from
RECOGNITION_USAGEevents similar to the batch path.livekit-agents/livekit/agents/metrics/usage_collector.py (1)
96-102: LGTM - STT token aggregation logic is correct.The collection pattern correctly mirrors the existing
LLMMetricsaccumulation, properly aggregating all five token fields fromSTTMetrics.livekit-agents/livekit/agents/metrics/base.py (1)
43-52: LGTM - Token fields are well-defined with appropriate defaults.The token usage fields are correctly typed with sensible defaults (0) ensuring backward compatibility. The docstrings clearly explain the purpose of each field, distinguishing between total tokens, audio tokens, and text tokens.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
|
||
| # NEW: Token usage fields | ||
| input_tokens: int = 0 | ||
| """Total input tokens used (audio + text tokens).""" | ||
| output_tokens: int = 0 | ||
| """Total output tokens generated.""" | ||
| total_tokens: int = 0 | ||
| """Total tokens used (input + output).""" | ||
| audio_tokens: int = 0 | ||
| """Number of audio tokens in input.""" | ||
| text_tokens: int = 0 | ||
| """Number of text tokens in input (e.g., from prompt).""" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix trailing whitespace and consider removing temporary comment.
-
Linting error (Line 41): Blank line contains trailing whitespace - flagged by ruff.
-
Code hygiene: The
# NEW: Token usage fieldscomment is useful during review but could be removed before merge since the docstrings adequately document each field's purpose.
Proposed fix
"""Whether the STT is streaming (e.g using websocket)."""
-
- # NEW: Token usage fields
+
input_tokens: int = 0🧰 Tools
🪛 GitHub Check: ruff
[failure] 41-41: Ruff (W293)
livekit-agents/livekit/agents/metrics/base.py:41:1: W293 Blank line contains whitespace
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/metrics/base.py` around lines 41 - 53, Remove
the trailing whitespace on the empty line preceding the token fields to satisfy
the linter, and delete the temporary review comment "# NEW: Token usage fields"
(or replace it with a concise docstring/header) since each field already has a
docstring; update the block with the attributes input_tokens, output_tokens,
total_tokens, audio_tokens, and text_tokens in livekit.agents.metrics.base (the
variables named input_tokens, output_tokens, total_tokens, audio_tokens,
text_tokens) so only the documented fields remain and no trailing spaces exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py`:
- Around line 439-471: The current construction of stt.SpeechEvent sets
token_usage to None when input/output/total tokens are zero, which drops
audio_tokens/text_tokens if only detailed counts exist; update the logic in the
block that builds token_usage (around resp/usage handling and the
stt.SpeechEvent creation) so you always populate the token_usage dict with
input_tokens, output_tokens, total_tokens, audio_tokens, and text_tokens and
then set token_usage to that dict if any of those five values is non-zero (e.g.,
use a any(...) check on the dict values) instead of checking only
input/output/total; reference the resp/usage extraction and the
stt.SpeechEvent(...) call to locate where to change the condition.
♻️ Duplicate comments (2)
livekit-agents/livekit/agents/metrics/base.py (1)
41-52: Remove the temporary comment and trailing whitespace.The inline note is no longer needed, and the blank line appears to include whitespace (ruff W293).
🧹 Suggested cleanup
- - # NEW: Token usage fields + input_tokens: int = 0livekit-agents/livekit/agents/metrics/usage_collector.py (1)
25-31: Remove trailing whitespace after the STT token fields.The blank line after
stt_text_tokensappears to contain whitespace (ruff W293).🧹 Suggested cleanup
stt_audio_tokens: int = 0 stt_text_tokens: int = 0 - + # properties for naming consistency: prompt = input, completion = output
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
livekit-agents/livekit/agents/metrics/base.pylivekit-agents/livekit/agents/metrics/usage_collector.pylivekit-agents/livekit/agents/stt/stt.pylivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py
🚧 Files skipped from review as they are similar to previous changes (1)
- livekit-agents/livekit/agents/stt/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/metrics/base.pylivekit-agents/livekit/agents/metrics/usage_collector.pylivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py
🧬 Code graph analysis (1)
livekit-agents/livekit/agents/metrics/usage_collector.py (1)
livekit-agents/livekit/agents/telemetry/http_server.py (1)
metrics(18-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
- GitHub Check: unit-tests
🔇 Additional comments (1)
livekit-agents/livekit/agents/metrics/usage_collector.py (1)
95-101: Aggregation looks correct.STT token fields are accumulated consistently with the new metrics.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # Extract token usage if available | ||
| input_tokens = 0 | ||
| output_tokens = 0 | ||
| total_tokens = 0 | ||
| audio_tokens = 0 | ||
| text_tokens = 0 | ||
| if hasattr(resp, "usage") and resp.usage: | ||
| usage = resp.usage | ||
| input_tokens = getattr(usage, "input_tokens", 0) | ||
| output_tokens = getattr(usage, "output_tokens", 0) | ||
| total_tokens = getattr(usage, "total_tokens", 0) | ||
|
|
||
| # Extract detailed token breakdown | ||
| if hasattr(usage, "input_token_details") and usage.input_token_details: | ||
| details = usage.input_token_details | ||
| audio_tokens = getattr(details, "audio_tokens", 0) | ||
| text_tokens = getattr(details, "text_tokens", 0) | ||
|
|
||
| # Create the speech event with token usage | ||
| speech_event = stt.SpeechEvent( | ||
| type=stt.SpeechEventType.FINAL_TRANSCRIPT, | ||
| alternatives=[sd], | ||
| token_usage={ | ||
| "input_tokens": input_tokens, | ||
| "output_tokens": output_tokens, | ||
| "total_tokens": total_tokens, | ||
| "audio_tokens": audio_tokens, | ||
| "text_tokens": text_tokens, | ||
| } | ||
| if (input_tokens > 0 or output_tokens > 0 or total_tokens > 0) | ||
| else None, | ||
| ) | ||
| return speech_event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t drop audio/text usage when totals are missing.
If only detailed tokens are present, token_usage becomes None and metrics lose audio/text counts.
✅ Suggested fix
- speech_event = stt.SpeechEvent(
+ has_usage = any(
+ token > 0
+ for token in (input_tokens, output_tokens, total_tokens, audio_tokens, text_tokens)
+ )
+ speech_event = stt.SpeechEvent(
type=stt.SpeechEventType.FINAL_TRANSCRIPT,
alternatives=[sd],
token_usage={
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"total_tokens": total_tokens,
"audio_tokens": audio_tokens,
"text_tokens": text_tokens,
}
- if (input_tokens > 0 or output_tokens > 0 or total_tokens > 0)
- else None,
+ if has_usage
+ else None,
)🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py` around
lines 439 - 471, The current construction of stt.SpeechEvent sets token_usage to
None when input/output/total tokens are zero, which drops
audio_tokens/text_tokens if only detailed counts exist; update the logic in the
block that builds token_usage (around resp/usage handling and the
stt.SpeechEvent creation) so you always populate the token_usage dict with
input_tokens, output_tokens, total_tokens, audio_tokens, and text_tokens and
then set token_usage to that dict if any of those five values is non-zero (e.g.,
use a any(...) check on the dict values) instead of checking only
input/output/total; reference the resp/usage extraction and the
stt.SpeechEvent(...) call to locate where to change the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/fake_io.py (1)
88-99: Duplicateon_playback_finishedcall and_pushed_durationreset.The method calls
on_playback_finishedtwice with identical parameters, and resets_pushed_duration = 0.0twice. Per the base class implementation inio.py, callingon_playback_finishedmore than expected triggers a warning: "playback_finished called more times than playback segments were captured". The second call (lines 94-98) and second reset (line 99) appear to be accidental duplication.🐛 Proposed fix to remove duplicate code
self.on_playback_finished( playback_position=played_duration, interrupted=True, synchronized_transcript=None, ) self._pushed_duration = 0.0 - self.on_playback_finished( - playback_position=played_duration, - interrupted=True, - synchronized_transcript=None, - ) - self._pushed_duration = 0.0
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/fake_io.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
tests/fake_io.py
🧬 Code graph analysis (1)
tests/fake_io.py (4)
livekit-agents/livekit/agents/voice/io.py (1)
on_playback_finished(191-218)livekit-agents/livekit/agents/voice/recorder_io/recorder_io.py (1)
on_playback_finished(379-486)livekit-agents/livekit/agents/voice/transcription/synchronizer.py (2)
on_playback_finished(554-579)synchronized_transcript(281-285)livekit-agents/livekit/agents/voice/speech_handle.py (1)
interrupted(83-84)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.9)
- GitHub Check: type-check (3.13)
🔇 Additional comments (1)
tests/fake_io.py (1)
83-87: LGTM!The explicit calculation with clamping between
[0, _pushed_duration]ensures valid playback position bounds and the comments clearly explain the rationale.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
Excited to see this PR, as I noticed this week my STT was missing the data required to track costs - after this PR - any idea if Elevenlabs and Azure STT will have token counting? Otherwise I guess I need to make a PR to implement these as we are experimenting with those |
|
For Azure STT (Azure Speech Studio), billing is based on audio duration, not on tokens. I just finished a fix for Azure STT tracing, it was capturing the duration but not emitting it back to the metrics. For now I’m not sure about ElevenLabs, I haven’t checked that. For Azure OpenAI the existing PR will work, since this also provides inference for GPT-4o Transcribe or Whisper like models. GPT-4o Transcribe billing is based on three things: input audio tokens, input text tokens, and output tokens (text by default). Whisper billing is based on duration only. |
…f wall-clock time on interruption
ed74c13 to
917cfd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/fake_io.py (1)
88-99: Critical: Duplicateon_playback_finishedcall will emit the event twice.Lines 88-92 and 94-98 both call
on_playback_finishedwith identical parameters, and_pushed_durationis reset twice (lines 93 and 99). This appears to be a merge/copy-paste error that will cause duplicateplayback_finishedevents to be emitted.Looking at the
on_playback_finishedimplementation inio.py, it tracks segment counts and will log a warning for the extra call: "playback_finished called more times than playback segments were captured".🐛 Proposed fix: Remove the duplicate call
self._flush_handle = None # Calculate played duration based on real elapsed time, capped at pushed duration # This matches the behavior of ConsoleAudioOutput and accounts for speed_factor # in tests (check_timestamp multiplies by speed_factor to convert to test time) played_duration = time.time() - self._start_time played_duration = min(max(0, played_duration), self._pushed_duration) self.on_playback_finished( playback_position=played_duration, interrupted=True, synchronized_transcript=None, ) self._pushed_duration = 0.0 - self.on_playback_finished( - playback_position=played_duration, - interrupted=True, - synchronized_transcript=None, - ) - self._pushed_duration = 0.0
🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py`:
- Around line 303-323: The nested call_soon_threadsafe is redundant:
_emit_recognition_usage is already scheduled via self._loop.call_soon_threadsafe
(call site that passes evt.result.result_id, audio_duration), so remove the
inner self._loop.call_soon_threadsafe inside _emit_recognition_usage and
directly call self._event_ch.send_nowait(...) (wrapped in the existing
contextlib.suppress), keeping the SpeechEvent construction and
stt.RecognitionUsage unchanged; this simplifies _emit_recognition_usage and
avoids double-scheduling.
♻️ Duplicate comments (2)
livekit-agents/livekit/agents/metrics/base.py (1)
41-53: Remove the temporary review comment before merging.The
# NEW: Token usage fieldscomment on line 42 is a development marker that should be removed before merge. The docstrings already document each field's purpose.♻️ Proposed fix
"""Whether the STT is streaming (e.g using websocket).""" - # NEW: Token usage fields input_tokens: int = 0livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py (1)
457-470: Token usage condition doesn't includeaudio_tokensortext_tokens.The condition on line 468 only checks
input_tokens,output_tokens, andtotal_tokens. If onlyaudio_tokensortext_tokensare non-zero (with the others at 0),token_usagewill incorrectly be set toNone, losing the detailed token breakdown.🐛 Proposed fix
speech_event = stt.SpeechEvent( type=stt.SpeechEventType.FINAL_TRANSCRIPT, alternatives=[sd], token_usage={ "input_tokens": input_tokens, "output_tokens": output_tokens, "total_tokens": total_tokens, "audio_tokens": audio_tokens, "text_tokens": text_tokens, } - if (input_tokens > 0 or output_tokens > 0 or total_tokens > 0) + if any((input_tokens, output_tokens, total_tokens, audio_tokens, text_tokens)) else None, )
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
livekit-agents/livekit/agents/metrics/base.pylivekit-agents/livekit/agents/metrics/usage_collector.pylivekit-agents/livekit/agents/stt/stt.pylivekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.pylivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.pytests/fake_io.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/stt/stt.pylivekit-agents/livekit/agents/metrics/base.pylivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.pylivekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.pylivekit-agents/livekit/agents/metrics/usage_collector.pytests/fake_io.py
🧬 Code graph analysis (4)
livekit-agents/livekit/agents/stt/stt.py (2)
livekit-agents/livekit/agents/metrics/base.py (1)
STTMetrics(30-54)livekit-agents/livekit/agents/utils/audio.py (1)
calculate_audio_duration(22-38)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py (1)
livekit-agents/livekit/agents/stt/stt.py (2)
SpeechEvent(89-95)SpeechEventType(33-50)
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py (1)
livekit-agents/livekit/agents/stt/stt.py (3)
SpeechEvent(89-95)SpeechEventType(33-50)RecognitionUsage(66-67)
tests/fake_io.py (3)
livekit-agents/livekit/agents/voice/recorder_io/recorder_io.py (1)
on_playback_finished(379-486)livekit-agents/livekit/agents/voice/io.py (1)
on_playback_finished(191-218)livekit-agents/livekit/agents/voice/transcription/synchronizer.py (2)
on_playback_finished(554-579)synchronized_transcript(281-285)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.9)
- GitHub Check: type-check (3.13)
- GitHub Check: unit-tests
🔇 Additional comments (4)
livekit-agents/livekit/agents/stt/stt.py (2)
70-86: Well-structured TypedDict for optional token usage.Using
TypedDictwithtotal=Falsecorrectly models the optional nature of token fields across different STT providers. The docstrings clearly document each field's purpose.
187-214: Token extraction and metrics emission implemented correctly.The token extraction safely handles
Nonewith.get()and defaults to 0, maintaining backward compatibility. The token fields are properly propagated toSTTMetrics.livekit-agents/livekit/agents/metrics/usage_collector.py (2)
25-31: STT token tracking fields follow existing conventions.The new fields are consistently named with the
stt_prefix and default to 0 for backward compatibility.
95-101: STT token accumulation correctly integrated.The accumulation logic properly extends the existing
STTMetricshandling to include the new token fields, following the same pattern used for LLM and TTS metrics.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Realtime STT Token Usage Tracking for OpenAI and Duration-Based Providers
This PR implements a previously missing feature: comprehensive STT token usage tracking across LiveKit agents for both token-based and duration-based providers.
Our team was working on tracking STT usage for cost and metrics analysis, but we discovered that GPT-4o Transcribe did not populate token counts in STTMetrics, even though the API does provide them. After investigating, I implemented this missing feature.
Reference: OpenAI GPT-4o Transcribe documentation
Problem Statement
The LiveKit Agents framework previously did not track STT token usage consistently:
Solution
This PR introduces comprehensive STT metrics tracking across the agents framework:
input_tokens,output_tokens,total_tokens,audio_tokens, andtext_tokensfields with default0values.recognize()method to extract and emit token usage fromSpeechEvent.0, butaudio_durationis captured.Core Architecture Changes
STTMetrics.Example OpenAI API Response (generic text)
Key Changes
request_idlinking metrics to the transcription.0, no breaking changes to existing code.Benefits
Notes
UsageDuration) → token counts are0.0.Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.