Skip to content

IIT-PAVIS/BlindApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Assistance System

License: MIT Python 3.8+ Android API 24+ Docker

AI-powered vision assistance system for visually impaired users featuring real-time scene analysis, configurable prompts, and multi-platform support.

Overview

A comprehensive vision assistance system featuring configurable system prompts and session-based differential descriptions. The system adapts to various applications (navigation, safety, text reading, public transport) through external prompt files.

Supported AI Models:

  • Ollama Vision Models (Local: Qwen 2.5 VL, Gemma 3) - Default
  • GPT-4 Vision (OpenAI)
  • Florence2 (Microsoft)
  • CogVLM2, MoeLLaVA (Open source)

Key Features

Configurable Prompt System

  • 9 pre-built applications (navigation, safety, text reading, public transport, etc.)
  • Create custom applications via text files
  • Instant switching without code changes

Session-Based Intelligence

  • Per-user memory with differential descriptions
  • Silent mode when scene unchanged (80%+ cognitive relief)
  • Automatic context reset on prompt/model changes

Full Accessibility

  • Complete TalkBack support with custom actions
  • Gesture-based camera controls (single/double tap)
  • Hardware button integration (volume keys, Bluetooth)
  • Multi-language support (English, Italian, Spanish, French)

Path Recording

  • Continuous capture with configurable intervals
  • GPS location and orientation tracking
  • Cloud data synchronization via UDP API

Quick Start

Server Setup (Docker)

# Clone repository
git clone https://github.com/your-username/blindapplication.git
cd blindapplication

# Build and run with Docker
docker-compose up --build

# Server runs on https://localhost:8085

Server Setup (Manual)

# Install dependencies
pip install -r requirements.txt

# Configure Ollama (optional, for local AI)
./configure_ollama.sh

# Run server
python multi_server.py

Android App

cd vision_application

# Build APK
./gradlew assembleDebug

# Install to connected device
adb install -r app/build/outputs/apk/debug/app-debug.apk

Project Structure

blindapplication/
├── src/
│   ├── server/          # Flask vision server
│   ├── models/          # AI model backends
│   ├── client/          # Python client libraries
│   └── data/prompts/    # Configurable prompt files
├── vision_application/  # Android app (Kotlin)
├── doc/                 # Comprehensive documentation
├── cane/                # Smart cane hardware (ESP32/Arduino)
├── tools/               # Deployment and utility scripts
└── templates/           # Web interface templates

Documentation

Document Description
Server Documentation API, deployment, AI model integration
App Documentation Android app user guide and development
Visual Cane Documentation Hardware build guide

Configuration

Environment Variables

# UDP API Authentication (required for path recording)
export UDP_API_KEY=your_api_key_here

# OpenAI (optional, for GPT-4 Vision)
export OPENAI_API_KEY=your_openai_key

Server Configuration

The server supports multiple AI backends configured via command line or environment:

# Run with Ollama (default)
python multi_server.py --model ollama

# Run with GPT-4 Vision
python multi_server.py --model gpt4 --api-key $OPENAI_API_KEY

API Endpoints

Endpoint Method Description
/analyze POST Analyze image with AI vision
/prompts GET List available prompt types
/models GET List available AI models
/health GET Server health check

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • RAISE Project for accessibility research support
  • CNR-IMATI for UDP infrastructure
  • Ollama team for local AI model support

About

App for objects and buses decription for blind people

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published