A CLI-based AI chatbot for analyzing UBER SEC 10-K filings using Retrieval-Augmented Generation (RAG) and Google's Gemini API.
- AI-Powered Analysis: Interactive chat with a FunctionAgent that answers questions about UBER's financial data from 2019-2022.
- RAG with LlamaIndex: Indexes SEC filings for accurate, context-aware responses.
- CLI Interface: Simple command-line tools for data loading and chatting.
- Modular Design: Clean, maintainable code structure for easy extension.
-
Clone the repo:
git clone https://github.com/yourusername/gemi-chat.git cd gemi-chat -
Install dependencies (using uv for speed):
uv sync
-
Set up environment:
- Create a
.envfile:GOOGLE_API_KEY=your_gemini_api_key - Or set the env var:
$env:GOOGLE_API_KEY = "your_key"
- Create a
-
Load and index data:
python -m src.cli --load-data
-
Start interactive chat:
python -m src.cli --chat
-
View help:
python -m src.cli --help
Example chat: Ask questions like "What were UBER's revenue trends in 2020?" and get AI-powered answers based on the filings.
Keep architecture diagrams up-to-date with your codebase:
python generate_architecture.pyThis script automatically analyzes the src/ directory and regenerates:
architecture_diagram.md- Detailed component diagramcodebase_analysis.md- Module analysis report
Gemi_Chat/
├── src/ # Main package
│ ├── __init__.py # Package init
│ ├── config.py # Settings & env vars
│ ├── data_loader.py # Loads UBER HTML data
│ ├── index_manager.py # Manages vector indices
│ ├── ageny.py # AI agent & tools
│ ├── cli.py # Command-line interface
│ ├── custom_console.py # Console utilities
│ └── google_llm_init.py # Gemini LLM setup
├── pyproject.toml # Project config & deps
├── system_prompt.txt # Agent system prompt
├── .env # Environment variables
├── data/UBER/ # UBER SEC filings
└── storage/ # Persisted indices
config.py: Centralized configuration (years, paths, API keys).data_loader.py: Data ingestion with UnstructuredReader.index_manager.py: Vector index creation/persistence.ageny.py: Agent setup with query engines and chat loop.cli.py: CLI with argparse for commands.custom_console.py: Spinners, colors, timers.google_llm_init.py: Google Gemini LLM initialization.
The CLI_Chat application follows a modular RAG (Retrieval-Augmented Generation) architecture:
CLI Interface → Data Processing → Vector Indexing → AI Agent → Chat Interface
- Data Pipeline: HTML SEC filings → Document parsing → Vector embeddings → Persistent storage
- Query Pipeline: User question → Index retrieval → Context augmentation → LLM generation → Response
- Agent System: FunctionAgent with specialized tools for multi-year financial analysis
- Detailed Component Diagram: Complete system architecture with all modules and dependencies
- Process Flow Diagram: High-level user journey and data flow
- LlamaIndex: Vector indexing, query engines, and agent framework
- Google Gemini: LLM for generation and embeddings for semantic search
- Unstructured.io: Document parsing for HTML SEC filings
- RAG Pattern: Retrieval-augmented generation for accurate financial analysis
- Fork the repo.
- Create a feature branch.
- Commit changes.
- Push and open a PR.
MIT License - see LICENSE file for details.
This is for educational purposes only. Not financial advice.