Why
The architecture mentions an extractor for JS-heavy pages, but it hasn't been implemented. Many modern websites require JavaScript execution to properly extract content.
Tasks
- Introduce an optional dependency path (e.g., extras: web-render).
- Implement an extractor that uses Selenium or Playwright to handle JS-heavy pages with a hard timeout.
- Document the heavy footprint and recommend skipping in CI environments.
Acceptance Criteria
- There is a config key (e.g., selenium) that resolves when the extra dependency is installed and fails cleanly when it is not.