Skip to content

Diff in behavior in public_eval vs master branch #87

@smirnp

Description

@smirnp

Hi, thanks for your great work!

I'm running some benchmark and noticed that in master branch system sometimes goes in almost infinite continue_chaning loop performing "search_in_memory" toolcalls with slightly different inputs derived from originally requested user content (it appends the user_message with type=continue_chaining after each iteration of such a look).

In the branch "public_evaluation" the tool "search_in_memory" is usually called once and it is followed the user_message of type=heartbeat (I guess that is the reason, which leads to final send_message tool)

Is there any way to affect the behavior of master branch (so that it finished bechmarks performing a limited amount of toolcalls?

Since the code in public_evaluation seems to be quite obsolete.

Thank you!

UPD: the problem spotted for GPT-5-nano and GPT-5-mini models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions