Skip to content

azure-ai-evaluation: maximum recursion depth exceeded #44816

@Wixee

Description

@Wixee
  • Package Name: azure.ai.evaluation
  • Package Version: 1.13.7
  • Operating System: Windows 11
  • Python Version: 3.11.11

Describe the bug
Please find the details in #43040

In a scheduled job / long-running environment, azure-ai-evaluation can fail with a RecursionError that is triggered by the SDK’s stdout/stderr interception logic. This ultimately makes evaluate(...) fail (surfaced as an internal evaluation error).

Expected behavior

evaluate(...) completes and returns results/metrics reliably in batch/concurrent runs.

Actual behavior

Evaluation occasionally fails due to RecursionError originating from the SDK’s stdout/stderr wrapper, which recursively forwards write() calls through a deep chain of wrapped streams.

Suspected root cause

The legacy logging layer replaces sys.stdout and sys.stderr with NodeLogWriter via NodeLogManager. NodeLogWriter.write() scrubs credentials and then forwards output to a “previous” stream self._prev_out.write(s) when no node context is present.

Under concurrency / multi-threaded execution, sys.stdout may be wrapped multiple times (wrapper → wrapper → wrapper …), because each NodeLogManager captures the current sys.stdout as its prev_stdout. When later writing output (e.g., run summary / streaming output), the call chain through self._prev_out.write(...) becomes deep enough to overflow recursion depth.

Relevant code locations:

stdout/stderr replacement: _logging.py:162-188
recursive forwarding in write(): _logging.py:245-256
run streaming writes to sys.stdout: _run_submitter.py:204-263
NodeLogManager used per-line (concurrent): _engine.py:443

Why this matters

This can make evaluation flaky/non-deterministic in production scheduled jobs, especially when concurrency is enabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EvaluationIssues related to the client library for Azure AI EvaluationService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions