A proof of concept for multimodal approach to generate accessible webpages for use with screen readers when accessibility markers are not present.
Segment a webpage screenshot into components, map them to DOM elements, and generate an annotated, accessible view.
The project is in a PoC staghe, your contributions are welcome to bring it to life as a viable software package!
- Generate annotations for segmented content
- Link existing webpage elements to an acessibility tree
- Create a web extension to automate the screen capture and UI
- Clone the repository:
git clone https://github.com/rawaha-e/a11y-vision- Install the dependencies:
pip install -r requirements.txt- Download SAM2 checkpoint:
mkdir checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt -O checkpoints/sam2.1_hiera_large.pt-
Place
screenshot.pngin the project directory. -
Run the inference:
python3 segment_webpage.py