-
Notifications
You must be signed in to change notification settings - Fork 37.1k
Add timeout for content extraction in fetch tool #285102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a timeout mechanism to prevent content extraction from hanging indefinitely in the web page loader used by the fetch tool. The change addresses issue #280736 by wrapping the content extraction logic (both accessibility tree and DOM extraction) in a raceTimeout call with a 2-second timeout.
Key changes:
- Adds
EXTRACT_CONTENT_TIMEOUTconstant (2 seconds) to limit content extraction duration - Wraps the content extraction logic in
raceTimeoutto handle hanging extractions - Adds a test case to verify timeout behavior when accessibility tree extraction hangs
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/vs/platform/webContentExtractor/electron-main/webPageLoader.ts | Introduces the EXTRACT_CONTENT_TIMEOUT constant and wraps the content extraction logic in raceTimeout to prevent indefinite hangs during accessibility tree or DOM extraction |
| src/vs/platform/webContentExtractor/test/electron-main/webPageLoader.test.ts | Adds test case to verify that timeout errors are properly returned when accessibility tree extraction hangs indefinitely |
| }); | ||
|
|
||
| const loadPromise = loader.load(); | ||
| window.webContents.emit('did-start-loading'); |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only emits 'did-start-loading' but not 'did-finish-load'. While this works because the timeout will eventually trigger content extraction, it would be more realistic and explicit to emit 'did-finish-load' as well, similar to other tests in this file. This would also make the test's intent clearer and ensure it's testing the timeout during extraction rather than relying on the page load timeout.
| window.webContents.emit('did-start-loading'); | |
| window.webContents.emit('did-start-loading'); | |
| window.webContents.emit('did-finish-load'); |
| test('returns error when accessibility tree extraction hangs', () => runWithFakedTimers({ useFakeTimers: true }, async () => { | ||
| const uri = URI.parse('https://example.com/page'); | ||
| const loader = createWebPageLoader(uri); | ||
|
|
||
| window.webContents.debugger.sendCommand.callsFake((command: string) => { | ||
| switch (command) { | ||
| case 'Network.enable': | ||
| return Promise.resolve(); | ||
| case 'Accessibility.getFullAXTree': | ||
| // Return a promise that never resolves to simulate hanging | ||
| return new Promise(() => { }); | ||
| default: | ||
| assert.fail(`Unexpected command: ${command}`); | ||
| } | ||
| }); | ||
|
|
||
| const loadPromise = loader.load(); | ||
| window.webContents.emit('did-start-loading'); | ||
| const result = await loadPromise; | ||
|
|
||
| assert.strictEqual(result.status, 'error'); | ||
| if (result.status === 'error') { | ||
| assert.ok(result.error.includes('Failed to extract meaningful content')); | ||
| } | ||
| // Verify executeJavaScript was NOT called for DOM extraction | ||
| assert.ok(!window.webContents.executeJavaScript.called); | ||
| })); |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a test case where accessibility tree extraction returns insufficient content quickly, but then DOM extraction hangs. This would provide more comprehensive coverage of the timeout behavior during the fallback extraction phase. Currently, the test only covers the case where accessibility tree extraction itself hangs.
| await raceTimeout((async () => { | ||
| result = await this.extractAccessibilityTreeContent() ?? ''; | ||
| if (result.length < WebPageLoader.MIN_CONTENT_LENGTH) { | ||
| this.trace(`Accessibility tree extraction yielded insufficient content, trying main DOM element extraction`); | ||
| const domContent = await this.extractMainDomElementContent() ?? ''; | ||
| result = domContent.length > result.length ? domContent : result; | ||
| } | ||
| })(), WebPageLoader.EXTRACT_CONTENT_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some cancellation? Like we shouldn't do the second async call if we've timed out.
Fixes #280736