AI-powered vision assistance system for visually impaired users featuring real-time scene analysis, configurable prompts, and multi-platform support.
A comprehensive vision assistance system featuring configurable system prompts and session-based differential descriptions. The system adapts to various applications (navigation, safety, text reading, public transport) through external prompt files.
Supported AI Models:
- Ollama Vision Models (Local: Qwen 2.5 VL, Gemma 3) - Default
- GPT-4 Vision (OpenAI)
- Florence2 (Microsoft)
- CogVLM2, MoeLLaVA (Open source)
- 9 pre-built applications (navigation, safety, text reading, public transport, etc.)
- Create custom applications via text files
- Instant switching without code changes
- Per-user memory with differential descriptions
- Silent mode when scene unchanged (80%+ cognitive relief)
- Automatic context reset on prompt/model changes
- Complete TalkBack support with custom actions
- Gesture-based camera controls (single/double tap)
- Hardware button integration (volume keys, Bluetooth)
- Multi-language support (English, Italian, Spanish, French)
- Continuous capture with configurable intervals
- GPS location and orientation tracking
- Cloud data synchronization via UDP API
# Clone repository
git clone https://github.com/your-username/blindapplication.git
cd blindapplication
# Build and run with Docker
docker-compose up --build
# Server runs on https://localhost:8085# Install dependencies
pip install -r requirements.txt
# Configure Ollama (optional, for local AI)
./configure_ollama.sh
# Run server
python multi_server.pycd vision_application
# Build APK
./gradlew assembleDebug
# Install to connected device
adb install -r app/build/outputs/apk/debug/app-debug.apkblindapplication/
├── src/
│ ├── server/ # Flask vision server
│ ├── models/ # AI model backends
│ ├── client/ # Python client libraries
│ └── data/prompts/ # Configurable prompt files
├── vision_application/ # Android app (Kotlin)
├── doc/ # Comprehensive documentation
├── cane/ # Smart cane hardware (ESP32/Arduino)
├── tools/ # Deployment and utility scripts
└── templates/ # Web interface templates
| Document | Description |
|---|---|
| Server Documentation | API, deployment, AI model integration |
| App Documentation | Android app user guide and development |
| Visual Cane Documentation | Hardware build guide |
# UDP API Authentication (required for path recording)
export UDP_API_KEY=your_api_key_here
# OpenAI (optional, for GPT-4 Vision)
export OPENAI_API_KEY=your_openai_keyThe server supports multiple AI backends configured via command line or environment:
# Run with Ollama (default)
python multi_server.py --model ollama
# Run with GPT-4 Vision
python multi_server.py --model gpt4 --api-key $OPENAI_API_KEY| Endpoint | Method | Description |
|---|---|---|
/analyze |
POST | Analyze image with AI vision |
/prompts |
GET | List available prompt types |
/models |
GET | List available AI models |
/health |
GET | Server health check |
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- RAISE Project for accessibility research support
- CNR-IMATI for UDP infrastructure
- Ollama team for local AI model support