This is the perfect place for these instructions! A README.md file is exactly what you need to guide users.
You've captured all the essential steps. I've reorganized them into a clear, step-by-step README.md format that's easy to follow, adding a few crucial details (like installing the Python libraries) and using Markdown for clarity.
DOWNLOAD FILES, ADD ALL FILES TO A FOLDER AND CREATE ANOTHER FOLDER AND ADD ALL FILES BUT SERVER.PY
Here is your revised README.md file:
Local-VLM Screenshot Helper This project gives you a simple Chrome Extension that lets you take a screenshot of your current webpage and ask questions directly to a local multimodal AI model (like Qwen2-VL or LLaVA) running in LM Studio.
This provides the power of advanced visual AI browsers (like the OpenAI Atlas browser for Mac) to any user, on any platform, with 100% privacy and local control.
⚙️ How It Works (The 3 Components) LM Studio (The "Brain"): Runs your local multimodal model and provides an API.
Python Backend (The "Bridge"): A lightweight FastAPI server that receives the request from Chrome and passes it to LM Studio.
Chrome Extension (The "Button"): The UI you click in your browser. It takes the screenshot, gets your question, and sends it to the Python backend.
🛠️ Prerequisites Before you begin, you must have the following software installed:
Python (Version 3.8 or newer)
LM Studio
Google Chrome
🚀 Installation & Setup Guide Follow these steps in order to get the tool running.
Step 1: Configure LM Studio (The "Brain") Download a Model: Open LM Studio. In the "Search" tab (🔍), download a multimodal ("VL" or "Vision-Language") model.
Recommended: Qwen3-VL 30B (or similar size)
Lighter alternatives: LLaVA 1.6 7B, Qwen3-VL 7B
Start the Server: Go to the "Server" tab (<->).
At the top, select the model you just downloaded.
Click "Start Server" and leave LM Studio running in the background.
Step 2: Set Up the Python Backend (The "Bridge") Open Your Terminal: Navigate to the folder where you downloaded the ai-screenshot-backend files (the one containing server.py).
Install Dependencies: Run the following command to install the necessary Python libraries:
Bash
pip install "fastapi[all]" uvicorn openai (Do not run the server yet!)
Step 3: Load the Chrome Extension (The "Button") Open Chrome Extensions: Open Google Chrome, type chrome://extensions in the address bar, and press Enter.
Enable Developer Mode: Find the "Developer mode" toggle in the top-right corner and turn it ON.
Load the Extension:
Click the "Load unpacked" button.
A file dialog will open. Navigate to and select the chrome-screenshot-extension folder.
Copy the Extension ID:
Your new "LM Studio Screenshot Helper" extension will appear in the list.
Find its ID (a long string of random letters) and copy it.
Step 4: Connect the Backend to the Extension Edit server.py: Open the server.py file (from Step 2) in a text editor.
Update Line 16: Find the allow_origins line (around line 16):
Python
allow_origins=["chrome-extension://ffibdanlkjnilbljnmmjhlmobjiilbkg"], Replace the placeholder ID with the new ID you just copied from Chrome:
Python
allow_origins=["chrome-extension://abcdefghijklmnoppqrstuvwxyz123456"], Save the server.py file.
Run the Backend: Now, go back to your terminal (from Step 2) and run the server:
Bash
python server.py Leave this terminal running in the background.
✨ How to Use You are all set up!
Pin the Extension: In Chrome, click the "puzzle piece" (🧩) icon and click the "pin" (📌) next to "LM Studio Screenshot Helper" to add it to your toolbar.
Go to Any Webpage: Navigate to a page you want to ask a question about.
Click the Icon: Click the extension's icon in your toolbar.
Ask Your Question: Type a question about the page (e.g., "Summarize this article," "What is this chart showing?") into the text box.
Click "Take Screenshot & Ask" and wait a few seconds for your local AI to respond!
☕ Support My Work If you find this tool useful and want to support my journey from medic to maker, please consider buying me a coffee!