Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions src/vs/platform/webContentExtractor/electron-main/webPageLoader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ export class WebPageLoader extends Disposable {
private static readonly TIMEOUT = 30000; // 30 seconds
private static readonly POST_LOAD_TIMEOUT = 5000; // 5 seconds - increased for dynamic content
private static readonly FRAME_TIMEOUT = 500; // 0.5 seconds
private static readonly EXTRACT_CONTENT_TIMEOUT = 2000; // 2 seconds
private static readonly IDLE_DEBOUNCE_TIME = 500; // 0.5 seconds - wait after last network request
private static readonly MIN_CONTENT_LENGTH = 100; // Minimum content length to consider extraction successful

Expand Down Expand Up @@ -329,12 +330,15 @@ export class WebPageLoader extends Disposable {
try {
const title = this._window.webContents.getTitle();

let result = await this.extractAccessibilityTreeContent() ?? '';
if (result.length < WebPageLoader.MIN_CONTENT_LENGTH) {
this.trace(`Accessibility tree extraction yielded insufficient content, trying main DOM element extraction`);
const domContent = await this.extractMainDomElementContent() ?? '';
result = domContent.length > result.length ? domContent : result;
}
let result = '';
await raceTimeout((async () => {
result = await this.extractAccessibilityTreeContent() ?? '';
if (result.length < WebPageLoader.MIN_CONTENT_LENGTH) {
this.trace(`Accessibility tree extraction yielded insufficient content, trying main DOM element extraction`);
const domContent = await this.extractMainDomElementContent() ?? '';
result = domContent.length > result.length ? domContent : result;
}
})(), WebPageLoader.EXTRACT_CONTENT_TIMEOUT);
Comment on lines +334 to +341
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some cancellation? Like we shouldn't do the second async call if we've timed out.


if (result.length === 0) {
this._onResult({ status: 'error', error: 'Failed to extract meaningful content from the web page' });
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -815,6 +815,34 @@ suite('WebPageLoader', () => {
assert.ok(window.webContents.executeJavaScript.called);
}));

test('returns error when accessibility tree extraction hangs', () => runWithFakedTimers({ useFakeTimers: true }, async () => {
const uri = URI.parse('https://example.com/page');
const loader = createWebPageLoader(uri);

window.webContents.debugger.sendCommand.callsFake((command: string) => {
switch (command) {
case 'Network.enable':
return Promise.resolve();
case 'Accessibility.getFullAXTree':
// Return a promise that never resolves to simulate hanging
return new Promise(() => { });
default:
assert.fail(`Unexpected command: ${command}`);
}
});

const loadPromise = loader.load();
window.webContents.emit('did-start-loading');
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only emits 'did-start-loading' but not 'did-finish-load'. While this works because the timeout will eventually trigger content extraction, it would be more realistic and explicit to emit 'did-finish-load' as well, similar to other tests in this file. This would also make the test's intent clearer and ensure it's testing the timeout during extraction rather than relying on the page load timeout.

Suggested change
window.webContents.emit('did-start-loading');
window.webContents.emit('did-start-loading');
window.webContents.emit('did-finish-load');

Copilot uses AI. Check for mistakes.
const result = await loadPromise;

assert.strictEqual(result.status, 'error');
if (result.status === 'error') {
assert.ok(result.error.includes('Failed to extract meaningful content'));
}
// Verify executeJavaScript was NOT called for DOM extraction
assert.ok(!window.webContents.executeJavaScript.called);
}));
Comment on lines +818 to +844
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test case where accessibility tree extraction returns insufficient content quickly, but then DOM extraction hangs. This would provide more comprehensive coverage of the timeout behavior during the fallback extraction phase. Currently, the test only covers the case where accessibility tree extraction itself hangs.

Copilot uses AI. Check for mistakes.

test('returns error when both accessibility tree and DOM extraction yield no content', () => runWithFakedTimers({ useFakeTimers: true }, async () => {
const uri = URI.parse('https://example.com/empty-page');

Expand Down
Loading