Required prerequisites
Motivation
More and more model providers are releasing computer-use models that support pure pixel-based clicking and page interactions. We need to add support for this mode of operation, which relies purely on visual input to interact with pages using pixel coordinates and UI elements.
Solution
No response
Alternatives
No response
Additional context
No response