- Open GitHub in your browser and log in
- Click your profile picture (top-right corner)
- Click Settings
- Scroll down the left sidebar and click Developer settings (at the very bottom)
- Click Personal access tokens
- Click Tokens (classic)
- Click Generate new token → Generate new token (classic)
- Give it a name like "llama-cpp-standalone"
- Set expiration: Choose "No expiration" or custom
- Select scopes: Check the box for
repo(this gives full repository access) - Scroll down and click Generate token
- IMPORTANT: Copy the token immediately! You won't see it again.
- It looks like:
ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- It looks like:
- Go to https://github.com/new
- Repository name:
llama-cpp-python-standalone - Description: "Simple Python wrapper for llama.cpp server - use new models before Python bindings catch up"
- Keep it Public (so others can benefit)
- DON'T initialize with README (we already have one)
- Click Create repository
# Install GitHub CLI (if not installed)
# Ubuntu/Debian:
sudo apt install gh
# macOS:
brew install gh
# Login
gh auth login
# Create repo
gh repo create llama-cpp-python-standalone --public \
--description "Simple Python wrapper for llama.cpp server - use new models before Python bindings catch up"cd /home/gregor/aidev/supercmd/tools/llama-cpp-python-standalone
# Initialize git if not already
git init
# Add all files
git add .
# Make initial commit
git commit -m "Initial release - Python wrapper for llama.cpp server
- Simple wrapper bypassing outdated llama-cpp-python
- OpenAI-compatible API
- Support for new architectures (Qwen3-VL, etc.)
- Auto-build script with GPU detection
- Vision model examples
- Context manager support"
# Add your GitHub repo as remote (replace YOUR_TOKEN)
git remote add origin https://YOUR_TOKEN@github.com/cronos3k/llama-cpp-python-standalone.git
# Push to GitHub
git branch -M main
git push -u origin mainSecurity Note: The token in the URL is temporary. After first push, git will remember it.
- Go to your repo: https://github.com/cronos3k/llama-cpp-python-standalone
- Click the ⚙️ gear icon next to "About" (top right)
- Add topics:
llama-cpp,llm,python,gguf,qwen,cuda,local-ai,openai-api - Click Save changes
- Go to your repo
- Click Releases (right sidebar)
- Click Create a new release
- Tag:
v1.0.0 - Title:
Initial Release - Description:
## Features
- 🚀 Simple Python wrapper for llama.cpp server
- ✅ Bypass outdated llama-cpp-python bindings
- ✅ Support for new architectures (Qwen3-VL, Gemma3, etc.)
- ✅ OpenAI-compatible API
- ✅ Auto-build script with GPU detection
- ✅ Vision model examples
## Quick Start
See README.md for installation and usage instructions.- Click Publish release
- Your token doesn't have
reposcope - Generate a new token with correct permissions
- Check the URL:
https://github.com/cronos3k/llama-cpp-python-standalone - Make sure repo was created successfully
- Tokens can expire - generate a new one
- Consider using SSH keys instead (more secure)
Instead of tokens, you can use SSH:
- Generate SSH key:
ssh-keygen -t ed25519 -C "your_email@example.com" - Add to GitHub: Settings → SSH and GPG keys → New SSH key
- Use SSH URL:
git@github.com:cronos3k/llama-cpp-python-standalone.git