A minimal, modular AI command-line interface that connects to Ollama services (local or remote) for interactive AI conversations.
π€ AI-Assisted Development: This project leverages AI coding assistants (GitHub Copilot, Claude) to accelerate development iterations while maintaining human oversight for architecture decisions, code review, and quality control. AI assists with implementation details; humans drive the vision.
- π€ Connect to local or remote Ollama services
- π¬ Interactive chat with AI models
- π @ Prefix Autocomplete - TAB completion for files/directories with automatic context injection
- π RAG Context System - Redis-backed vector embeddings for semantic search
- π³ Directory Tree Visualization - ASCII tree structure for directory context
- π Code Generation - Write Python/R code to files automatically
- π MCP Tool System - Modular Context Protocol for extensible operations
- π Session Management - Context-persistent conversations with history injection
- β‘ Code Execution - Run Python/R code with automatic output capture
- π― Dynamic Model Management - Add, remove, and switch models at runtime (no restart needed)
- π Embedding Service Abstraction - External embedding services with automatic fallback
- π /code Command - Unified interface for complex code task orchestration
- π§ /make Command - Execute Makefile targets using natural language (
:shortcut) - π‘ $ Prefix (NEW) - Direct MCP tool execution with interactive dropdowns
- βοΈ Configurable via YAML file
- π Streaming and non-streaming response modes
- π OpenWebUI Compatible - Drop-in Ollama replacement API
- π OpenAI API Compatibility - Works with standard OpenAI clients
- π οΈ 11 Built-in MCP Tools - Code execution, file operations, RAG tools
- π― Intelligent Tool Matching - Semantic search finds the right tool automatically
- π File Upload Support - Upload and reference files in conversations
- π§ Multi-step Orchestration - Break down complex tasks automatically
- πΎ Auto-save Sessions - Conversations saved to Redis automatically
- π Restore Sessions - Resume any previous conversation by ID
- π List Sessions - View all saved sessions with metadata
- ποΈ Session Management - Clear specific or all sessions
- π³ Docker Compose with Redis, PostgreSQL, Transformer services
- π Easy setup with automated scripts and Makefile
- π Sentry integration for error tracking
# Automated setup (recommended)
make setup
# Build and start all services (Ollama, Redis, Transformer, PostgreSQL)
make build-all-services
make up-all
# Run the CLI
make run
# Or: ./start.sh# Start all services including the Ollama API
docker compose --profile app --profile api up -d
# API available at http://localhost:8080
# Works with OpenWebUI, standard Ollama clients, and OpenAI API clientsUsing with OpenWebUI:
- Set Ollama API URL in OpenWebUI settings to:
http://host.docker.internal:8080 - All MCP tools and RAG features are automatically available
Using Remote Ollama:
Set OLLAMA_API_URL in .env to point to your remote Ollama server:
OLLAMA_API_URL=http://your-ollama-server:11434See DOCUMENTATION.md for detailed guides and docs/ for specific features.
cli/
βββ config.yaml # Configuration for Ollama and chat
βββ docker-compose.yml # Multi-service Docker setup
βββ Makefile # Build automation and commands
βββ main.py # Main CLI entry point
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ docs/ # Feature documentation
β βββ AT_PREFIXER_FEATURE.md
β βββ SESSION_FEATURE.md
β βββ SESSION_PERSISTENCE.md
β βββ TOOL_RETRIEVAL_FEATURE.md
β βββ MAKEFILE_COMMANDS.md
βββ src/ # Core modules
β βββ config/ # Configuration management
β βββ ollama_client/ # Ollama client
β βββ chat/ # Chat management
β βββ mcp/ # MCP client system
β βββ session/ # Session persistence
β βββ transformer/ # Embedding service (Docker)
β βββ redis/ # Redis API service
β β βββ flask-app/ # Flask API for embeddings
β βββ postgresql/ # PostgreSQL API service
β β βββ flask-app/ # Flask API for MCP tools
β βββ utils/ # Utilities (tree, etc.)
β βββ file_completer.py # @ prefix autocomplete
βββ ollama_api_service/ # Ollama++ API (NEW)
β βββ app.py # FastAPI application
β βββ models.py # Pydantic models
β βββ routes/ # API route handlers
β β βββ chat.py # /api/chat endpoints
β β βββ generate.py # /api/generate endpoints
β β βββ openai.py # OpenAI compatibility layer
β β βββ tools.py # MCP tool endpoints
β β βββ files.py # File upload endpoints
β βββ utils/ # Ollama adapter utilities
βββ system_mcps/ # MCP tool servers
β βββ coder/ # Code execution & file tools
βββ tests/ # Test suite
β βββ test_ollama_api_*.py # API integration tests
β βββ test_tool_retrieval.py
βββ testing/ # Test applications
βββ python_app/ # Python test structure
βββ r_app/ # R test structure
- Python 3.7 or higher
- Docker and Docker Compose (for running Ollama in a container)
- Make (optional, for using Makefile commands)
# Run automated setup
make setup
# Or: ./setup.sh
# This will:
# - Create Python virtual environment
# - Install all dependencies
# - Create .env file
# - Optionally start Docker containers-
Clone the repository:
git clone <repository-url> cd cli
-
Create environment file:
cp .env.example .env
-
Start Ollama service:
docker compose --profile ollama up -d
This will:
- Start the Ollama service in a container
- Automatically pull the
tinyllamamodel (~1GB, CPU-friendly) - Create a persistent volume for model storage
-
Wait for model download (first time only):
docker compose logs -f ollama-setup
Wait until you see "Ollama setup complete!"
-
Run the CLI:
./start.sh
-
Clone the repository:
git clone <repository-url> cd cli
-
Install Ollama locally: Follow instructions at https://ollama.ai/
-
Pull a model:
ollama pull tinyllama # or llama2, mistral, etc. -
Configure the CLI: Edit
config.yamlto set your Ollama service URL and preferred model:ollama: url: "http://localhost:11434" # Change for remote Ollama model: "tinyllama" # Change to your preferred model timeout: 120 chat: system_prompt: "You are a helpful AI assistant." max_context_length: 10 temperature: 0.7 stream: true
-
Run the CLI:
./start.sh
The script will automatically:
- Create a Python virtual environment
- Install all required dependencies
- Start the AI CLI
Once the CLI starts, you can:
- Chat with AI: Simply type your message and press Enter
- @ Prefix for files: Type
@filename+ TAB for autocomplete, automatically adds file context - @ Prefix for directories: Type
@dirname/to add entire directory with tree visualization - Generate code: Type
@newfile.py <description>to generate code directly to file - Clear history: Type
clearto reset the conversation - List models: Type
modelsto see available Ollama models - Switch model: Type
switchto change the current model - MCP tools: Type
mcpsto list available tools,mcp-tools <name>for tool details - Model management commands:
/model status- Show all configured models/model list- List all models/model <type> list- List models of specific type (general/coder/embedding)/model <type> add <url> <model_name>- Add a general or coder model/model embedding add <url> [timeout]- Add external embedding service/model <type> use <model_id>- Set active model/model <type> remove <model_id>- Remove model/model check [model_id]- Check model availability
- Code command:
/code <prompt>- Execute complex coding tasks with automatic tool orchestration
- Direct MCP tool execution (NEW):
$ <prompt>- Interactive MCP tool selection with coder model parameter extraction
- Session commands:
/session start- Start a context-persistent session/session end- End the current session/session info- Display current session information/session list- List all saved sessions (NEW)/session restore <id>- Restore a previous session (NEW)/session clear- Clear all saved sessions (NEW)
- Context commands:
/context add @file- Add file/directory to context without LLM call (NEW)/context show- Display current context (chat, session, metadata)/context clear- Clear context (keeps session active)
- Repomap commands:
/repomap create- Create a repository map from working directory/repomap load- Load existing .repomap file into context
- Make commands:
/make <prompt>- Execute make commands using natural language/make map generate- Generate .makemap from Makefile/make map update- Update .makemap with new targets/make map load- Load .makemap into context:- Shortcut for/make(e.g.,:run tests)
- Exit: Type
exitorquitto close the CLI
The repomap feature helps you create and maintain a comprehensive map of your repository structure. This is useful for providing AI assistants with context about your codebase.
Creating a Repository Map:
βΆ /repomap create
π¦ Creating repository map...
π Collecting source code files...
β Found 50 source files
π³ Generating directory tree...
β Directory tree generated
π€ Generating repository map with LLM...
β Repository map created successfully!
π Saved to: /path/to/.repomap
Loading a Repository Map into Context:
βΆ /repomap load
π Loading repository map: .repomap
β Repository map loaded into context!
Size: 15,230 bytes
Session: temporary (start a session for persistence)
Automatic Loading with /code Command:
When using the /code command, the .repomap file (if it exists) is automatically loaded into context to provide better understanding of the codebase structure.
The .repomap file contains:
- Directory Tree: ASCII visualization of the project structure
- Project overview and purpose
- Architecture and design patterns
- Directory structure explanation
- Key components and their responsibilities
- Entry points and dependencies
- Data flow and configuration details
- Testing structure and getting started guide
The /make command lets you execute Makefile targets using natural language. It automatically parses your Makefile, generates a .makemap file with target descriptions, and matches your intent to the right make commands.
Quick Shortcut: Use : as a shortcut for /make:
βΆ :run tests
# Equivalent to: /make run tests
Generating a Make Map:
βΆ /make map generate
π¦ Generating .makemap from Makefile...
π Parsing Makefile...
β Found 15 targets
π€ Generating descriptions with LLM...
β .makemap created successfully!
π Saved to: /path/to/.makemap
Executing Make Commands with Natural Language:
βΆ /make run the integration tests
π Matching command...
β
Matched: make test-integration
β‘ Executing...
Running integration tests (requires containers)...
β Integration tests completed
Auto-detection: If a .makemap file exists, the CLI automatically detects make-related prompts:
βΆ build the docker images
π§ Detected make command
β
Matched: make build-all-services
β‘ Executing...
The .makemap file contains:
- Targets: All available make targets with descriptions
- Dependencies: Target dependencies
- Variables: Makefile variables and their defaults
- Recipes: Command summaries for each target
The session feature allows you to maintain conversation context across multiple prompts by automatically injecting previous interactions as context. When a session is active, the last 5 interactions are automatically included as context, enabling coherent multi-turn conversations.
Example:
βΆ /session start
π Session started at 14:30:45
βΆ What is the capital of France?
βΆ Paris
βΆ What's the population of that city?
βΆ Paris has approximately 2.2 million people...
# AI understands "that city" refers to Paris
βΆ /session end
β
Session ended (started at 14:30:45, 2 interactions)
Session Persistence (NEW): Sessions are automatically saved to Redis and can be restored later:
βΆ /session list
π Saved Sessions:
1. abc123... | 2025-11-25 14:30 | 5 interactions | "Python help"
2. def456... | 2025-11-24 10:15 | 12 interactions | "Docker setup"
βΆ /session restore abc123
β
Session restored with 5 interactions
See docs/SESSION_FEATURE.md and docs/SESSION_PERSISTENCE.md for detailed documentation.
The /context add command allows you to add files and directories to the conversation context without triggering an LLM response. This is useful when you want to incrementally build up context before asking questions, avoiding unnecessary token usage.
Difference from @ prefix:
- Using
@file.py what does this do?- Adds file to context AND triggers LLM response - Using
/context add @file.py- Only adds file to context, no LLM call
Example:
βΆ /context add @src/main.py
β Added 1 file(s) to context:
β’ src/main.py
βΆ /context add @src/utils/
β Added 1 directory(s) to context:
β’ src/utils/ (15 files, 3 directories)
βΆ /context show
π Current Context:
Chat Messages: 0
Session: Active
β’ ID: abc123...
β’ Duration: 120s
β’ Interactions: 2
βΆ Now explain how main.py uses the utils module
# AI now has both main.py and utils/ in context
Use Cases:
- Batch Context Loading: Add multiple files/directories before asking questions
- Token Efficiency: Build context incrementally without triggering LLM on each addition
- Session Preparation: Prepare context at the start of a session for later questions
- Code Review Setup: Load all relevant files first, then ask specific questions
The /code command provides a unified interface for complex code task orchestration. It automatically:
- Analyzes your prompt and breaks it into steps
- Matches the best MCP tools for each step
- Executes the tools in sequence
Example:
βΆ /code create a python script that reads data from users.csv, filters active users, and generates a bar chart
π Analyzing task...
π Task breakdown:
1. Read CSV file
2. Filter data for active users
3. Generate bar chart visualization
π§ Matching tools...
β
Matched 3 tools for execution
β‘ Executing...
β Step 1: read_csv_file
β Step 2: filter_data
β Step 3: create_visualization
β
Task completed successfully!
Auto-session: If no session is active, /code will automatically start one for you.
See docs/CODE_COMMAND.md for detailed documentation and examples.
The $ prefix provides interactive MCP tool selection with automatic parameter extraction using your coder model. Perfect when you know what type of task you want but want to browse available tools.
How it works:
- Type
$followed by your request - Select MCP server from dropdown (arrow keys + Enter)
- Select tool from dropdown
- System extracts parameters using coder model
- Tool executes with results displayed
Example:
βΆ $ generate 100 fake records from @users.csv
π§ Direct MCP Tool Execution Mode
Select MCP server and tool to execute...
π¦ Select MCP Server:
βΆ coder
data-engineer
β Selected MCP: data-engineer
π§ Select Tool from data-engineer:
compare_ast_similarity
compare_code_similarity
generate_ast
βΆ generate_fake_data
generate_fake_data_ctgan
β Selected Tool: generate_fake_data
π€ Extracting parameters from prompt using coder model...
Parameters extracted: {
"file_path": "users.csv",
"num_samples": 100,
"working_dir": "/path/to/dir"
}
β‘ Executing tool 'generate_fake_data' on MCP 'data-engineer'...
β Tool execution completed
Key Features:
- Interactive selection - Browse all MCPs and tools with arrow keys
- Coder model enforced - Uses coder model for accurate parameter extraction
- Smart parameter detection - Extracts file paths, counts, and options from natural language
- Filtered tool list - Automatically excludes meta/orchestration tools
See docs/DOLLAR_PREFIX_MCP_TOOL_EXECUTION.md for complete documentation.
Manage AI models at runtime without restarting the application:
βΆ /model status
π Model Configuration:
General: llama3.1:8b (http://192.168.31.23:11434) β
Coder: qwen2.5-coder:7b (http://192.168.31.23:11434) β
Embedding: External Service (http://localhost:16050) β
βΆ /model general add http://localhost:11434 mistral
β
Added general model: mistral (ID: abc123)
βΆ /model general use abc123
β
Switched to model: mistral
Features:
- Add/remove models dynamically
- Switch between models instantly
- Support for external embedding services
- Automatic availability checking
- Redis-backed persistence
See docs/DYNAMIC_MODEL_MANAGEMENT.md for detailed documentation.
==================================================
AI CLI - Powered by Ollama
==================================================
Type 'exit' or 'quit' to exit
Type 'clear' to clear chat history
Type 'models' to list available models
==================================================
Using model: tinyllama
Connected to: http://localhost:11434
You: Hello! Can you help me with Python?
AI: Of course! I'd be happy to help you with Python...
You: exit
Goodbye!
The project includes a Makefile for easy management:
# Show all available commands
make help
# Setup
make setup # Complete setup (venv + dependencies + Docker)
make venv # Create virtual environment
make install # Install Python dependencies
# Build & Run
make build-all-services # Build all Docker images
make up-all # Start all services (Ollama + Redis + Transformer + PostgreSQL)
make up-redis # Start only Redis services
make run # Run the CLI
# Web UI
make ui # Start the AI CLI Web UI in background
make ui-logs # Start the AI CLI Web UI with logs (foreground)
make ui-stop # Stop the AI CLI Web UI
# Docker Management
make down # Stop Docker containers
make restart # Restart Docker containers
make logs # View container logs
make status # Show container status
# Redis Management
make build-redis # Build Redis API image
make redis-logs # Show Redis API logs
make redis-cli # Execute Redis CLI
make redis-clear # Clear all Redis data (with confirmation)
make redis-info # Show Redis statistics
make redis-api-health # Check Redis API health
make transformer-health # Check Transformer service health
# Database
make migrate-session # Apply session database migration
make update-schema # Update PostgreSQL schema
# Ollama
make pull-model MODEL=llama2 # Pull a specific model
make list-models # List available models
# Cleanup
make clean # Remove venv and volumesSee docs/MAKEFILE_COMMANDS.md for detailed command documentation.
# Start Ollama
make up
# Stop Ollama
make down
# View logs
make logs
# Check status
make statusdocker compose --profile ollama up -ddocker compose --profile ollama downdocker compose logs -f ollamadocker compose psdocker compose down -vurl: Ollama service URL (default:http://localhost:11434)model: AI model to use (e.g., tinyllama, llama2, mistral, codellama)timeout: Request timeout in seconds (default: 120)
system_prompt: Initial prompt to guide AI behaviormax_context_length: Number of messages to keep in context (default: 10)temperature: Response randomness, 0.0-1.0 (default: 0.7)stream: Enable streaming responses (default: true)
If you prefer to set up manually instead of using start.sh:
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the CLI
python main.pyUsing Docker Compose:
- Ensure Ollama container is running:
docker compose ps - Check container logs:
docker compose logs ollama - Verify the service is healthy:
docker compose ps(should show "healthy") - Restart the service:
docker compose restart ollama
Using Local Installation:
- Ensure Ollama is running:
ollama serve - Verify the URL in
config.yamlmatches your Ollama service - For remote Ollama, ensure network connectivity
Using Docker Compose:
- Check if setup container completed:
docker compose logs ollama-setup - Manually pull a model:
docker compose exec ollama ollama pull tinyllama - List available models:
docker compose exec ollama ollama list
Using Local Installation:
- Pull the model:
ollama pull <model-name> - Update
config.yamlwith the correct model name - Use the
modelscommand in the CLI to see available models
- Ensure Docker and Docker Compose are installed
- Check if ports are available (default: 11434)
- View all logs:
docker compose logs - Recreate containers:
docker compose down && docker compose --profile ollama up -d
The project uses a modular architecture:
- Config Module (
src/config/): Handles configuration loading and management - Ollama Client Module (
src/ollama_client/): Manages communication with Ollama - Chat Module (
src/chat/): Handles conversation context and message management - MCP Module (
src/mcp/): Model Context Protocol client for tool execution - Session Module (
src/session/): Session persistence and management
- Transformer Service (
src/transformer/): Sentence embeddings for semantic search - PostgreSQL API (
src/postgresql/flask-app/): MCP tool storage and retrieval - Redis API (
src/redis/flask-app/): RAG vector storage and session persistence - Ollama++ API (
ollama_api_service/): OpenWebUI-compatible API with enhanced features
# Run all tests
make test
# Run specific test file
pytest tests/test_ollama_api_integration.py -v
# Run with coverage
pytest --cov=src tests/This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2025 Toavina A.