Vision Assistance System

AI-powered vision assistance system for visually impaired users featuring real-time scene analysis, configurable prompts, and multi-platform support.

Overview

A comprehensive vision assistance system featuring configurable system prompts and session-based differential descriptions. The system adapts to various applications (navigation, safety, text reading, public transport) through external prompt files.

Supported AI Models:

Ollama Vision Models (Local: Qwen 2.5 VL, Gemma 3) - Default
GPT-4 Vision (OpenAI)
Florence2 (Microsoft)
CogVLM2, MoeLLaVA (Open source)

Key Features

Configurable Prompt System

9 pre-built applications (navigation, safety, text reading, public transport, etc.)
Create custom applications via text files
Instant switching without code changes

Session-Based Intelligence

Per-user memory with differential descriptions
Silent mode when scene unchanged (80%+ cognitive relief)
Automatic context reset on prompt/model changes

Full Accessibility

Complete TalkBack support with custom actions
Gesture-based camera controls (single/double tap)
Hardware button integration (volume keys, Bluetooth)
Multi-language support (English, Italian, Spanish, French)

Path Recording

Continuous capture with configurable intervals
GPS location and orientation tracking
Cloud data synchronization via UDP API

Quick Start

Server Setup (Docker)

# Clone repository
git clone https://github.com/your-username/blindapplication.git
cd blindapplication

# Build and run with Docker
docker-compose up --build

# Server runs on https://localhost:8085

Server Setup (Manual)

# Install dependencies
pip install -r requirements.txt

# Configure Ollama (optional, for local AI)
./configure_ollama.sh

# Run server
python multi_server.py

Android App

cd vision_application

# Build APK
./gradlew assembleDebug

# Install to connected device
adb install -r app/build/outputs/apk/debug/app-debug.apk

Project Structure

blindapplication/
├── src/
│   ├── server/          # Flask vision server
│   ├── models/          # AI model backends
│   ├── client/          # Python client libraries
│   └── data/prompts/    # Configurable prompt files
├── vision_application/  # Android app (Kotlin)
├── doc/                 # Comprehensive documentation
├── cane/                # Smart cane hardware (ESP32/Arduino)
├── tools/               # Deployment and utility scripts
└── templates/           # Web interface templates

Documentation

Document	Description
Server Documentation	API, deployment, AI model integration
App Documentation	Android app user guide and development
Visual Cane Documentation	Hardware build guide

Configuration

Environment Variables

# UDP API Authentication (required for path recording)
export UDP_API_KEY=your_api_key_here

# OpenAI (optional, for GPT-4 Vision)
export OPENAI_API_KEY=your_openai_key

Server Configuration

The server supports multiple AI backends configured via command line or environment:

# Run with Ollama (default)
python multi_server.py --model ollama

# Run with GPT-4 Vision
python multi_server.py --model gpt4 --api-key $OPENAI_API_KEY

API Endpoints

Endpoint	Method	Description
`/analyze`	POST	Analyze image with AI vision
`/prompts`	GET	List available prompt types
`/models`	GET	List available AI models
`/health`	GET	Server health check

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

RAISE Project for accessibility research support
CNR-IMATI for UDP infrastructure
Ollama team for local AI model support

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
.vscode		.vscode
cane		cane
doc		doc
src		src
templates		templates
tools		tools
vision_application		vision_application
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE_ANALYSIS.md		LICENSE_ANALYSIS.md
PROJECT_ANALYSIS.md		PROJECT_ANALYSIS.md
README.md		README.md
build_docker.sh		build_docker.sh
configure_ollama.sh		configure_ollama.sh
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yaml		docker-compose.yaml
multi_server.py		multi_server.py
quick_udp_test.sh		quick_udp_test.sh
requirements.txt		requirements.txt
test_udp_upload.sh		test_udp_upload.sh
udp_curl_examples.sh		udp_curl_examples.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision Assistance System

Overview

Key Features

Configurable Prompt System

Session-Based Intelligence

Full Accessibility

Path Recording

Quick Start

Server Setup (Docker)

Server Setup (Manual)

Android App

Project Structure

Documentation

Configuration

Environment Variables

Server Configuration

API Endpoints

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

IIT-PAVIS/BlindApp

Folders and files

Latest commit

History

Repository files navigation

Vision Assistance System

Overview

Key Features

Configurable Prompt System

Session-Based Intelligence

Full Accessibility

Path Recording

Quick Start

Server Setup (Docker)

Server Setup (Manual)

Android App

Project Structure

Documentation

Configuration

Environment Variables

Server Configuration

API Endpoints

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages