Skip to content

Optional Selenium/Playwright extractor for JS-heavy pages #25

@pokymono

Description

@pokymono

Why

The architecture mentions an extractor for JS-heavy pages, but it hasn't been implemented. Many modern websites require JavaScript execution to properly extract content.

Tasks

  • Introduce an optional dependency path (e.g., extras: web-render).
  • Implement an extractor that uses Selenium or Playwright to handle JS-heavy pages with a hard timeout.
  • Document the heavy footprint and recommend skipping in CI environments.

Acceptance Criteria

  • There is a config key (e.g., selenium) that resolves when the extra dependency is installed and fails cleanly when it is not.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions