-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[GRPOTrainer]: Agent Training Supports Async Tool Calls #4742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GRPOTrainer]: Agent Training Supports Async Tool Calls #4742
Conversation
|
FYI: I had to hack the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…modith/trl into pramodith/async_tool_calling
Yes, I have noticed that some recent changes have caused CI to fail, but I haven't take time to look into this yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for asynchronous tool calls in GRPOTrainer, extending agent training capabilities to handle both synchronous and asynchronous tool functions. The implementation consolidates the async event loop infrastructure to support both async reward functions and async tools on a shared daemon thread.
Key changes:
- Separated tool dictionaries into
_sync_tool_dictand_async_tool_dictto handle both synchronous and asynchronous tools - Consolidated async event loop management from separate reward and tool loops into a single
async_loopshared by both - Updated
_tool_call_loopmethod to execute async tools usingasyncio.gatherwith proper exception handling
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| trl/trainer/grpo_trainer.py | Added async tool support by splitting tool dictionaries, consolidating event loop management, and implementing async tool execution in _tool_call_loop |
| tests/test_grpo_trainer.py | Added async_multiply_tool test function and parametrized test_training_with_tools to test both sync and async tools separately |
| docs/source/grpo_trainer.md | Updated documentation to show that tools can be sync or async, added example async_add function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self._has_async_reward_funcs = any(asyncio.iscoroutinefunction(func) for func in self.reward_funcs) | ||
| if self._has_async_reward_funcs: | ||
| self.async_reward_loop_thread, self.async_reward_loop, self.async_reward_loop_ready_event = ( | ||
| start_event_loop_in_daemon(name="GRPOTrainer-AsyncRewardLoop") | ||
| ) | ||
| # wait until the event loop is running in the daemon thread | ||
| self.async_reward_loop_ready_event.wait() | ||
| atexit.register(shutdown_event_loop_in_daemon, self.async_reward_loop_thread, self.async_reward_loop) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, just moved it down after we examine all the tools for async functions too.
What does this PR do?
Add support for async tool calls in GRPOTrainer.
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.