Skip to content

Conversation

@han032206
Copy link

No description provided.

Copy link
Collaborator

@gasse gasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @han032206 for this first draft! I have one main concern about validation, which I think can be done much more robustly using task-specific JS event handlers (see my detailed comments). Can you investigate this?

Copy link
Collaborator

@gasse gasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not needed for the environment to pass the last_action to the task. Also there is no need to patch the BrowserEnv class, everything can be done from within the WebCanvasTask class. See my detailed comments.

"browsergym_page_activated", lambda source: self._activate_page_from_js(source["page"])
)

self.context.expose_binding(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to modify the BrowserEnv class. This can be done within task.setup(page), using

page.context.expose_binding()

This one could also be useful, so that your JS code is executed on all pages visited by the agent (not just the page available at the time of setup())

page.context.add_init_script()


if hasattr(self.task, "webcanvas"):
logger.debug(f"Initiating webcanvas task event listen")
self._event_listener()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this can be done from within the GenericWebCanvasTask class, during task.setup(page).


return obs

def _event_listener(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should be moved to GenericWebCanvasTask

return True

def validate(
self, page: playwright.sync_api.Page, chat_messages: list[str], action: str = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for action here.

reward, done, user_message, info = self.task.validate(self.page, self.chat.messages)

reward, done, user_message, info = self.task.validate(
self.page, self.chat.messages, self.last_action
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this. The idea of using Javascript to register actions executed in the browser is so that there is no need any more for the environment to give the executed last_action to the task. The BrowserGym last action could be bid-based or coord-based, which does not match the XPath-based validation of WebCanvas, as far as I understand. By using an event listener + playwright callback instead, the task can keep track of all the JS events and validate that they correspond to the task's objectives.

Suggested change
self.page, self.chat.messages, self.last_action
self.page, self.chat.messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants