Transform static photos into short animated videos using Stable Video Diffusion, running 100% locally on your hardware with a modern microservices architecture.
Perfect for bringing vintage photos to life with smooth, natural motion!
This project uses a microservices architecture with separate containers:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β React Frontend β ββββΆ β FastAPI Backendβ ββββΆ β Model Service β
β Port: 3000 β β Port: 5000 β β Port: 5001 β
β β β β β (GPU-powered) β
β β’ Upload UI β β β’ File handlingβ β β’ SVD Model β
β β’ Parameters β β β’ Job queue β β β’ Video gen β
β β’ Progress β β β’ API routes β β β’ Docker β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Components:
- Frontend: React + TypeScript for modern UI
- Backend: FastAPI for async API with auto-documentation
- Model Service: Flask + PyTorch running Stable Video Diffusion in Docker
- Orchestration: Docker Compose for easy deployment
- Modern Tech Stack: React + FastAPI + Docker
- Microservices: Each component runs independently
- GPU Accelerated: Uses your RTX 5070 Ti through Docker
- Latest AI Model: CogVideoX-5B (August 2024) - best open-source quality
- Async Processing: Non-blocking video generation
- Auto Documentation: FastAPI provides
/docsendpoint - Easy Deployment: Single
docker-compose upcommand - Customizable Parameters:
- Prompt: Describe desired motion (NEW!)
- Duration (1-6 seconds)
- Frame rate (6-10 FPS, optimal: 8)
- Quality (30-100 inference steps)
- Guidance scale (prompt adherence)
-
Python 3.11 or 3.12
- Download: https://www.python.org/downloads/
β οΈ Check "Add Python to PATH" during installation
-
Node.js 20+ (LTS)
- Download: https://nodejs.org/
- Includes npm package manager
-
Docker Desktop
- Download: https://www.docker.com/products/docker-desktop/
- Required for GPU access and containerization
- Enable WSL 2 backend on Windows
- GPU: NVIDIA RTX 5070 Ti (or any CUDA GPU with 8GB+ VRAM)
- RAM: 16GB+ recommended
- Storage: ~20GB for models and cache
- Internet: Required for initial model download (~15GB)
# Run the setup script
.\setup.ps1This will:
- Check prerequisites
- Create Python virtual environment
- Install all dependencies
- Prepare for deployment
See SETUP.md for detailed manual installation steps.
Easiest way - runs everything in containers:
# Build and start all services
docker-compose up --build
# Or run in background
docker-compose up -d --build
# View logs
docker-compose logs -f
# Stop services
docker-compose downAccess the app:
- Frontend: http://localhost:3000
- Backend API: http://localhost:5000
- API Documentation: http://localhost:5000/docs (auto-generated!)
- Model Service: http://localhost:5001/health
Run each service separately (useful for development):
Terminal 1 - Model Service:
.\venv\Scripts\Activate
cd model-service
python model_service.pyTerminal 2 - Backend:
.\venv\Scripts\Activate
cd backend
python app.pyTerminal 3 - Frontend:
cd frontend
npm startAnimatedPhoto/
βββ backend/ # FastAPI backend
β βββ app.py # Main API application
β βββ requirements.txt # Python dependencies
β βββ Dockerfile # Container definition
β βββ uploads/ # Uploaded photos (auto-created)
β βββ outputs/ # Generated videos (auto-created)
β
βββ model-service/ # AI model container
β βββ model_service.py # Model inference API
β βββ requirements.txt # ML dependencies
β βββ Dockerfile # GPU-enabled container
β βββ .dockerignore # Exclude from build
β
βββ frontend/ # React frontend
β βββ src/ # Source code
β βββ public/ # Static files
β βββ package.json # npm dependencies
β βββ Dockerfile # Container definition
β
βββ docker-compose.yml # Orchestration config
βββ setup.ps1 # Automated setup script
βββ SETUP.md # Detailed setup guide
βββ README.md # This file
-
Upload Photo
- Click or drag & drop your image
- Supports PNG, JPG, JPEG (max 16MB)
-
Adjust Parameters
- Duration: Length of video (1-5 seconds)
- Frame Rate: Smoothness (7 FPS recommended)
- Quality: Inference steps (25 = good balance)
- Motion: Low/Medium/High animation strength
-
Generate Video
- Click "Generate Video"
- Wait 1-3 minutes (varies by settings)
- First generation loads model (~30 sec extra)
-
Download Result
- Preview video in browser
- Download MP4 file
The backend provides these REST API endpoints:
Check backend and model service status
Upload photo and start generation
- Form data:
photo,duration,fps,quality,motion_strength - Returns:
job_idfor tracking
Get job progress and status
- Returns:
status,progress,download_url
Download generated video
List all jobs (debugging)
Auto-generated API documentation (FastAPI feature!)
# Verify Python installation
python --version
# If not found, reinstall and add to PATH# Check NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
# If error, reinstall Docker Desktop and NVIDIA drivers- Check internet connection
- Ensure ~20GB free disk space
- Downloads resume automatically if interrupted
# Check what's using the port
netstat -ano | findstr :3000
netstat -ano | findstr :5000
netstat -ano | findstr :5001
# Kill process or change ports in docker-compose.yml- Close other GPU-intensive apps
- Lower quality/FPS settings
- Ensure 8GB+ VRAM available
# Activate venv
.\venv\Scripts\Activate
# Run with auto-reload
cd backend
uvicorn app:app --reload --host 0.0.0.0 --port 5000
# View API docs at http://localhost:5000/docscd frontend
npm start # Auto-reloads on changes
npm run build # Production build.\venv\Scripts\Activate
cd model-service
python model_service.py # Runs on port 5001- Framework: FastAPI 0.109+ with async/await
- Server: Uvicorn ASGI server
- Features: Background tasks, CORS, file uploads, prompt control
- Storage: In-memory job queue (use Redis for production)
- Model: CogVideoX-5B-I2V (THUDM/Tsinghua University) - Latest 2024 model!
- Framework: PyTorch 2.3+ with CUDA 12.1
- Features: Prompt-controllable animation, 6-second videos, high quality
- Optimization: Model CPU offload, VAE slicing & tiling
- Container: NVIDIA CUDA runtime for GPU access
- Framework: React 18+ with TypeScript
- Styling: CSS-in-JS with modern gradients
- Features: Drag-drop, progress tracking, video preview
- Build: Create React App (can migrate to Vite)
- Use quality=20 (instead of 25-30)
- Lower FPS (6-7 instead of 8-10)
- Shorter duration (2s instead of 3-4s)
- Use quality=35-50 (slower!)
- Higher FPS (9-10)
- Lower motion strength for portraits
- Model download: ~10-20 minutes
- Model loading: ~30 seconds
- First generation: ~2-3 minutes
- Subsequent: ~1-2 minutes
Backend (.env)
MODEL_SERVICE_URL=http://model-service:5001 # Docker
# MODEL_SERVICE_URL=http://localhost:5001 # LocalFrontend (.env)
REACT_APP_API_URL=http://localhost:5000- Stable Video Diffusion: Stability AI
- Diffusers Library: Hugging Face
- FastAPI: Tiangolo
- React: Meta
This project is for educational and personal use.
Stable Video Diffusion model has its own license from Stability AI.
Enjoy bringing your photos to life with modern microservices! π¬β¨