Skip to content

Conversation

@dmitrivMS
Copy link
Contributor

Fixes #280736

Copilot AI review requested due to automatic review settings December 26, 2025 12:13
@dmitrivMS dmitrivMS added the fetch-tool Issues concerning the fetch internal tool and web extracting service label Dec 26, 2025
@dmitrivMS dmitrivMS self-assigned this Dec 26, 2025
@dmitrivMS dmitrivMS enabled auto-merge December 26, 2025 12:13
@vs-code-engineering vs-code-engineering bot added this to the December / January 2026 milestone Dec 26, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a timeout mechanism to prevent content extraction from hanging indefinitely in the web page loader used by the fetch tool. The change addresses issue #280736 by wrapping the content extraction logic (both accessibility tree and DOM extraction) in a raceTimeout call with a 2-second timeout.

Key changes:

  • Adds EXTRACT_CONTENT_TIMEOUT constant (2 seconds) to limit content extraction duration
  • Wraps the content extraction logic in raceTimeout to handle hanging extractions
  • Adds a test case to verify timeout behavior when accessibility tree extraction hangs

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/vs/platform/webContentExtractor/electron-main/webPageLoader.ts Introduces the EXTRACT_CONTENT_TIMEOUT constant and wraps the content extraction logic in raceTimeout to prevent indefinite hangs during accessibility tree or DOM extraction
src/vs/platform/webContentExtractor/test/electron-main/webPageLoader.test.ts Adds test case to verify that timeout errors are properly returned when accessibility tree extraction hangs indefinitely

});

const loadPromise = loader.load();
window.webContents.emit('did-start-loading');
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only emits 'did-start-loading' but not 'did-finish-load'. While this works because the timeout will eventually trigger content extraction, it would be more realistic and explicit to emit 'did-finish-load' as well, similar to other tests in this file. This would also make the test's intent clearer and ensure it's testing the timeout during extraction rather than relying on the page load timeout.

Suggested change
window.webContents.emit('did-start-loading');
window.webContents.emit('did-start-loading');
window.webContents.emit('did-finish-load');

Copilot uses AI. Check for mistakes.
Comment on lines +818 to +844
test('returns error when accessibility tree extraction hangs', () => runWithFakedTimers({ useFakeTimers: true }, async () => {
const uri = URI.parse('https://example.com/page');
const loader = createWebPageLoader(uri);

window.webContents.debugger.sendCommand.callsFake((command: string) => {
switch (command) {
case 'Network.enable':
return Promise.resolve();
case 'Accessibility.getFullAXTree':
// Return a promise that never resolves to simulate hanging
return new Promise(() => { });
default:
assert.fail(`Unexpected command: ${command}`);
}
});

const loadPromise = loader.load();
window.webContents.emit('did-start-loading');
const result = await loadPromise;

assert.strictEqual(result.status, 'error');
if (result.status === 'error') {
assert.ok(result.error.includes('Failed to extract meaningful content'));
}
// Verify executeJavaScript was NOT called for DOM extraction
assert.ok(!window.webContents.executeJavaScript.called);
}));
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test case where accessibility tree extraction returns insufficient content quickly, but then DOM extraction hangs. This would provide more comprehensive coverage of the timeout behavior during the fallback extraction phase. Currently, the test only covers the case where accessibility tree extraction itself hangs.

Copilot uses AI. Check for mistakes.
Comment on lines +334 to +341
await raceTimeout((async () => {
result = await this.extractAccessibilityTreeContent() ?? '';
if (result.length < WebPageLoader.MIN_CONTENT_LENGTH) {
this.trace(`Accessibility tree extraction yielded insufficient content, trying main DOM element extraction`);
const domContent = await this.extractMainDomElementContent() ?? '';
result = domContent.length > result.length ? domContent : result;
}
})(), WebPageLoader.EXTRACT_CONTENT_TIMEOUT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some cancellation? Like we shouldn't do the second async call if we've timed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fetch-tool Issues concerning the fetch internal tool and web extracting service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetch tool never times out

3 participants