Skip to content

Conversation

@YxmMyth
Copy link

@YxmMyth YxmMyth commented Jan 10, 2026

This commit adds support for extracting CSS background images during crawling, addressing issue #1691 where background images were being skipped.

Changes

New Files

  • crawl4ai/js_snippet/extract_css_backgrounds.js: JavaScript script to extract background images from computed styles in the browser

Modified Files

  • crawl4ai/models.py:

    • Added css_images field to Media class
    • Added css_images_data field to AsyncCrawlResponse
  • crawl4ai/async_configs.py:

    • Added CSS background image configuration parameters to CrawlerRunConfig:
      • extract_css_images (bool, default False)
      • css_image_min_width (int, default 100)
      • css_image_min_height (int, default 100)
      • css_image_score_threshold (int, default 2)
      • css_exclude_repeating (bool, default True)
  • crawl4ai/content_scraping_strategy.py:

    • Added process_css_background_images() method
    • Integrated CSS image extraction into _process_element()
    • Added css_images to media dictionary
  • crawl4ai/async_crawler_strategy.py:

    • Added JavaScript execution in _crawl_web() to extract CSS backgrounds
    • Included css_images_data in AsyncCrawlResponse
  • crawl4ai/async_webcrawler.py:

    • Modified aprocess_html() to accept and pass css_images_data
    • Added Dict type import

Features

  • Extracts background images from both inline styles and stylesheets
  • Uses window.getComputedStyle() for accurate extraction
  • Smart filtering (small elements, repeating patterns)
  • Scoring system based on element size and properties
  • Opt-in by default for backward compatibility
  • Separate storage in media.css_images

Usage

result = await crawler.arun(
    url="https://example.com",
    extract_css_images=True,
    css_image_min_width=100,
    css_image_min_height=100,
)

css_images = result.media.get('css_images', [])

Closes #1691

This commit adds support for extracting CSS background images during crawling,
addressing issue unclecode#1691 where background images were being skipped.

## Changes

### New Files
- crawl4ai/js_snippet/extract_css_backgrounds.js: JavaScript script to extract
  background images from computed styles in the browser

### Modified Files
- crawl4ai/models.py:
  - Added `css_images` field to Media class
  - Added `css_images_data` field to AsyncCrawlResponse

- crawl4ai/async_configs.py:
  - Added CSS background image configuration parameters to CrawlerRunConfig:
    - extract_css_images (bool, default False)
    - css_image_min_width (int, default 100)
    - css_image_min_height (int, default 100)
    - css_image_score_threshold (int, default 2)
    - css_exclude_repeating (bool, default True)

- crawl4ai/content_scraping_strategy.py:
  - Added process_css_background_images() method
  - Integrated CSS image extraction into _process_element()
  - Added css_images to media dictionary

- crawl4ai/async_crawler_strategy.py:
  - Added JavaScript execution in _crawl_web() to extract CSS backgrounds
  - Included css_images_data in AsyncCrawlResponse

- crawl4ai/async_webcrawler.py:
  - Modified aprocess_html() to accept and pass css_images_data
  - Added Dict type import

## Features
- Extracts background images from both inline styles and stylesheets
- Uses window.getComputedStyle() for accurate extraction
- Smart filtering (small elements, repeating patterns)
- Scoring system based on element size and properties
- Opt-in by default for backward compatibility
- Separate storage in media.css_images

## Usage
```python
result = await crawler.arun(
    url="https://example.com",
    extract_css_images=True,
    css_image_min_width=100,
    css_image_min_height=100,
)

css_images = result.media.get('css_images', [])
```

Closes unclecode#1691

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: background-images seems to be skipped

1 participant