-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Open
Copy link
Labels
model clientsIssues related to the model client implementationsIssues related to the model client implementationspythonv1.0Features being tracked for the version 1.0 GAFeatures being tracked for the version 1.0 GA
Description
Description
Currently, when using OpenAIResponses, the agent instructions are sent as a system message on each agent invocation. However, because the Responses API maintains state on the server, the system message only needs to be sent once, not on every invocation. For larger agent instructions, this can greatly increase the number of input tokens to the model.
In the code sample below, when running agent.run("Tell me a joke", thread=thread), ideally only a single user message would be sent to the model, however two messages are sent - a system message and a user message. This results in the following thread messages:
System: Reply in uppercase.
User: Hello
Assistant: HELLO. HOW CAN I HELP YOU TODAY?
System: Reply in uppercase. <- Not needed
User: Tell me a joke
Assistant: WHY DID THE SCARECROW WIN AN AWARD? BECAUSE HE WAS OUTSTANDING IN HIS FIELD.
Code Sample
from agent_framework.azure import AzureOpenAIResponsesClient
from azure.identity import DefaultAzureCredential
async def main():
client = AzureOpenAIResponsesClient(credential=DefaultAzureCredential())
agent = client.as_agent(instructions="Reply in uppercase.")
thread = agent.get_new_thread()
response = await agent.run("Hello", thread=thread)
print(response)
response = await agent.run("Tell me a joke", thread=thread)
print(response)
if __name__ == "__main__":
import asyncio
asyncio.run(main())Language/SDK
Python
Metadata
Metadata
Assignees
Labels
model clientsIssues related to the model client implementationsIssues related to the model client implementationspythonv1.0Features being tracked for the version 1.0 GAFeatures being tracked for the version 1.0 GA
Type
Projects
Status
No status