Skip to content

Python: [Feature]: Don't send repeated system messages with OpenAIResponses #3498

@cecheta

Description

@cecheta

Description

Currently, when using OpenAIResponses, the agent instructions are sent as a system message on each agent invocation. However, because the Responses API maintains state on the server, the system message only needs to be sent once, not on every invocation. For larger agent instructions, this can greatly increase the number of input tokens to the model.

In the code sample below, when running agent.run("Tell me a joke", thread=thread), ideally only a single user message would be sent to the model, however two messages are sent - a system message and a user message. This results in the following thread messages:

System: Reply in uppercase.
User: Hello
Assistant: HELLO. HOW CAN I HELP YOU TODAY?
System: Reply in uppercase.  <- Not needed
User: Tell me a joke
Assistant: WHY DID THE SCARECROW WIN AN AWARD?  BECAUSE HE WAS OUTSTANDING IN HIS FIELD.

Code Sample

from agent_framework.azure import AzureOpenAIResponsesClient
from azure.identity import DefaultAzureCredential


async def main():
    client = AzureOpenAIResponsesClient(credential=DefaultAzureCredential())
    agent = client.as_agent(instructions="Reply in uppercase.")
    thread = agent.get_new_thread()

    response = await agent.run("Hello", thread=thread)
    print(response)

    response = await agent.run("Tell me a joke", thread=thread)
    print(response)


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Language/SDK

Python

Metadata

Metadata

Labels

model clientsIssues related to the model client implementationspythonv1.0Features being tracked for the version 1.0 GA

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions