AI Coding Hackathon Project · Built collaboratively by a human — acting as architect, tester, product manager — and AI agents during Maven’s Accelerator. The result is a fast, explainable Rust CLI that scans prompts for jailbreak indicators, clamps scores with a transparent rubric, and now adds a configurable input guardrail plus optional LLM verdicts.
- Quick Start
- Features
- Technical Overview
- Project Status
- Project Documentation
- Hackathon Context
- AI-Assisted Development Insights
- Provider Integration Pitfalls & Fixes
- Contributing
- License & Disclaimer
- Acknowledgments
# Clone the workspace and build the CLI (debug build by default)
git clone https://github.com/HendrikReh/llm-guard
cd llm-guard
cargo build --workspace
# Optional: build an optimized binary
cargo build -p llm-guard-cli --release
# Run the optimized binary directly
./target/release/llm-guard-cli --help
# Or install the CLI binary into ~/.cargo/bin
cargo install --path crates/llm-guard-cliThe compiled binary is named llm-guard-cli. After cargo install, invoke it via llm-guard-cli or create an alias (alias llm-guard=llm-guard-cli) if you prefer the shorter name used in examples below.
cargo fmt --all # Format source code
cargo lint # Clippy (alias from .cargo/config.toml)
cargo test-all # Run tests for all crates and features
cargo cov # HTML coverage report (requires cargo-llvm-cov)
just test # Uses cargo-nextest when available, falls back to cargo testCI is configured through .github/workflows/ci.yml to run the same checks on pull requests.
# Scan a file (reads stdin when --file is omitted)
./target/debug/llm-guard-cli scan --file samples/chat.txt
# Pipe input from another command
echo "Ignore previous instructions" | ./target/debug/llm-guard-cli scan
# Generate JSON output for automation
./target/debug/llm-guard-cli scan --file samples/chat.txt --json > report.json
# Augment with an LLM verdict (requires provider credentials)
LLM_GUARD_PROVIDER=openai \
LLM_GUARD_API_KEY=sk-... \
LLM_GUARD_MODEL=gpt-4o-mini \
./target/debug/llm-guard-cli scan --file samples/chat.txt --with-llm
# Switch providers via CLI overrides (values take precedence over env/config)
./target/debug/llm-guard-cli scan --file samples/chat.txt --with-llm \
--provider anthropic --model claude-3-haiku-20240307
# Tail a log file and scan new content as it arrives
./target/debug/llm-guard-cli scan --file logs/chat.log --tail
# Increase the input budget to 2 MB for large transcripts
./target/debug/llm-guard-cli --max-input-bytes 2000000 scan --file transcripts/long.txt
# Run health diagnostics for a specific provider
./target/debug/llm-guard-cli --debug health --provider openaiInput size:
llm-guard-clienforces a 1 MB (1,000,000 byte) cap per input. Tail mode and stdin use the same guard to avoid runaway memory usage. Override it with--max-input-bytesorLLM_GUARD_MAX_INPUT_BYTESwhen you deliberately need to scan larger corpora.
Sample output:
Risk: 72/100 (HIGH)
Findings:
[INSTR_OVERRIDE] "ignore previous instructions" at 0..29 (+16)
[PROMPT_LEAK] "reveal system prompt" at 45..65 (+14)
Synergy bonus (override+leak within 200 chars) (+5)
Exit codes: 0 (low), 2 (medium), 3 (high), 1 (error). Integrate the CLI into CI/CD pipelines by acting on those codes.
Sample prompts live under examples/prompt_safe.txt, examples/prompt_suspicious.txt, and examples/prompt_malicious.txt for quick demos.
Environment variables provide the quickest way to configure LLM access:
| Variable | Description | Default |
|---|---|---|
LLM_GUARD_PROVIDER |
Provider (openai, anthropic, gemini, azure, noop) |
openai |
LLM_GUARD_API_KEY |
API key/token (required unless provider=noop) |
– |
LLM_GUARD_ENDPOINT |
Custom endpoint/base URL | Provider default |
LLM_GUARD_MODEL |
Model identifier (gpt-4o-mini, claude-3-haiku-20240307, …) |
Provider default |
LLM_GUARD_DEPLOYMENT |
Deployment name (Azure rig profiles) | – |
LLM_GUARD_PROJECT |
Project or tenant identifier (Anthropic, Gemini) | – |
LLM_GUARD_WORKSPACE |
Workspace identifier when required | – |
LLM_GUARD_TIMEOUT_SECS |
HTTP timeout in seconds | 30 |
LLM_GUARD_MAX_RETRIES |
Retry attempts for failed calls | 2 |
LLM_GUARD_API_VERSION |
API version (Azure OpenAI) | Provider default |
LLM_GUARD_MAX_INPUT_BYTES |
Max bytes accepted from stdin/files | 1_000_000 |
Configuration precedence: CLI flags → environment variables → profile in llm_providers.yaml.
llm_providers.yaml lets you manage multiple providers side-by-side:
providers:
- name: openai
api_key: OPENAI_API_KEY
model: gpt-4o-mini
- name: azure
api_key: AZURE_OPENAI_KEY
endpoint: https://your-resource.openai.azure.com
deployment: gpt-4o-production
api_version: 2024-02-15-preview
timeout_secs: 60
max_retries: 3Override the location with --providers-config. You can also prime the environment from a .env file:
set -a && source .env && set +a
cargo run -p llm-guard-cli -- scan --file samples/chat.txt --with-llmSingle-provider setups may prefer a TOML config file:
# llm-config.toml
[llm]
provider = "anthropic"
model = "claude-3-haiku-20240307"
timeout_secs = 45
max_retries = 3cargo run -p llm-guard-cli -- --config llm-config.toml scan --with-llm --file prompt.txtinput exceeds 1000000 bytes— The scanner rejects inputs larger than 1 MB. Remove unnecessary context, chunk long transcripts, tail a filtered log, or raise the cap with--max-input-bytes.input contains invalid UTF-8— Inputs must be UTF-8. Re-encode your file (iconv -f utf-16 -t utf-8 ...) or ensure pipelines emit UTF-8 text.- Tail mode shows no updates — The watcher only prints when file content changes. Confirm your log writer writes the entire file each update or adjust intervals via
--tailplus periodicsleep. - LLM verdict is
unknown— Providers may return blank or malformed payloads; the CLI falls back to an informative placeholder. Re-run with--debugto inspect raw responses.
- Fast Aho-Corasick and precompiled regex scanning (<100 ms for typical prompts)
- Transparent risk scoring (0–100) with rule attribution, excerpts, and synergy bonuses
- Multiple input sources: stdin, files, and tail mode for streaming logs
- Human-readable and JSON output, with machine-friendly exit codes
- Optional LLM verdicts via OpenAI, Anthropic, Google Gemini, or Azure OpenAI (plus
noopsimulator) - Rig-backed provider health diagnostics (
healthsubcommand,--debugraw payload logging)
- Instruction override: “ignore previous instructions”, “reset system prompt”
- Data exfiltration: prompt leaks, hidden system prompt disclosure attempts
- Policy subversion: jailbreak / guardrail bypass patterns
- Obfuscation: base64 payloads, Unicode tricks, hex-encoded directives
- Streaming resilience: tail mode deduplicates unchanged snapshots and handles rapid log churn
llm-guard/
├── crates/
│ ├── llm-guard-core/
│ │ ├── src/
│ │ │ ├── scanner/ (rule repositories, scanning, scoring heuristics)
│ │ │ ├── report.rs (human + JSON reporters)
│ │ │ └── llm/ (OpenAI, Anthropic, Azure, Gemini, rig adapter, settings)
│ │ └── Cargo.toml
│ └── llm-guard-cli/
│ ├── src/main.rs (CLI, config loading, tail loop, provider health)
│ └── Cargo.toml
├── rules/ (keywords + pattern packs)
├── tests/ (integration + snapshot tests)
├── docs/ (usage, ADRs, testing guide, screenshots)
└── examples/ (sample prompts)
- Rule loading – Keywords and regex patterns are loaded from
rules/and validated. - Scanning – Inputs from stdin, files, or tail mode are analyzed for matches.
- Finding generation – Each match carries spans, rule IDs, excerpts, and weights.
- Risk scoring – Heuristic scoring aggregates weights, applies dampening, and clamps to 0–100.
- Reporting – Human or JSON output summarizes findings; optional LLM verdicts enrich the report.
| Score | Band | Recommendation |
|---|---|---|
| 0–24 | Low | Proceed – no prompt-injection indicators detected |
| 25–59 | Medium | Review – investigate before executing user instructions |
| 60–100 | High | Block – re-prompt or escalate for manual review |
Scores combine rule weights, dampened repeat hits, and a length normalization factor clamped to [0.5, 1.5]. The qualitative band drives exit codes (0, 2, 3) so you can fail CI jobs or gate automations based on the rubric.
aho-corasick,regex— high-performance pattern matchingserde,serde_json,serde_yaml,json5— serialization formatsclap— command-line parsingtokio,reqwest,async-trait— async runtime and HTTP clientstracing,tracing-subscriber— structured diagnosticsconfig,once_cell,thiserror,anyhow— configuration and error handlingrig-core— shared provider orchestration across OpenAI, Anthropic, and Azure adapters
Current phase: Active development; see PLAN.md for phase-by-phase progress (last updated 2025-10-17).
- ✅ Phases 0–6 complete (scanner, scoring, CLI, multi-provider LLM integration)
- ⚙️ Phase 7 hardening underway (expanded tests, fuzzing, CI polish)
- 📝 Phase 8 documentation tasks open (README refresh, usage deep-dives, release checklist)
- 🔄 Phase 9 rig.rs migration landed; final doc/test refresh still pending
Test suite: cargo test --workspace --all-features exercises 69 tests total (59 active, 10 ignored for loopback/TLS constraints). Snapshot fixtures live in tests/scanner_snapshots.rs.
| Document | Purpose | Audience |
|---|---|---|
README.md |
Project overview, quick start, AI insights | Everyone |
docs/USAGE.md |
CLI reference and advanced command examples | Operators |
docs/ARCHITECTURE.md |
Component and data-flow overview | Contributors |
docs/RULE_AUTHORING.md |
How to extend keyword & regex rule packs | Security engineers |
PRD.md |
Product requirements and success criteria | Builders & reviewers |
PLAN.md |
Phase tracking, outstanding work, status notes | Contributors |
AGENTS.md |
Onboarding guide for AI coding assistants | AI agents & humans |
docs/TESTING_GUIDE.md |
Testing strategy, commands, troubleshooting | Developers, QA |
docs/SECURITY.md |
Security guardrails, runtime expectations | Security reviewers |
docs/RELEASE_CHECKLIST.md |
Steps for shipping a tagged release | Maintainers |
docs/ADR/ |
Architecture decision records | Technical stakeholders |
This repository was created for Maven’s AI Coding Accelerator hackathon as a focused experiment in AI-assisted software delivery.
- Explore how far AI coding assistants can accelerate real-world development.
- Validate product development workflows where AI contributes to design, implementation, and docs.
- Produce a demonstrable prompt-injection firewall within a single-day sprint.
- Capture lessons learned about human + AI collaboration.
- Multi-agent collaboration: GPT-5 Codex handled most implementation; Claude Code reviewed and documented.
- Living documentation:
AGENTS.mdlets any assistant join with full context. - Transparent planning:
PLAN.mdlogs granular progress and decisions. - PRD-first workflow:
PRD.mdgoverned scope and rubric changes. - MCP integration: RepoPrompt and Context7 MCP servers kept AI agents aware of repository state.
- Requirements captured up front via PRD refinements between GPT-5 Codex and Claude Code.
- Cursor IDE hosted parallel terminals for Codex CLI and Claude Code, enabling rapid iteration.
- RepoPrompt MCP supplied curated repository slices, powering large-scale refactors (e.g., rig.rs migration).
justrecipes + cargo aliases standardized formatting, linting, testing, and coverage for every agent.- Observability (
tracing, debug flags) was prioritized early to simplify later provider debugging.
- Pairing multiple AI agents with distinct strengths reduced blocker time; humans focused on review and direction.
- Documenting conventions (
AGENTS.md,docs/TESTING_GUIDE.md) minimized context loss between agent hand-offs. - Maintaining a living plan avoided scope creep and clarified which phases were safe to trim when time-boxed.
- Local fallbacks (noop provider, dry-run health checks) kept demos functional even when real APIs misbehaved.
- Capturing pitfalls immediately in docs or ADRs prevented repeated regressions as agents rotated tasks.
- Anthropic truncation & malformed JSON: Added newline sanitisation, auto-repair for dangling quotes/braces, JSON5 fallback, and final "unknown" verdicts so scans never abort.
- OpenAI reasoning-only replies: Switched from strict
json_schematojson_object, captured tool-call arguments, and fallback to "unknown" verdicts when content is withheld. - Gemini + rig.rs incompatibilities: Bypassed rig for Gemini due to schema mismatches; implemented a native REST client that formats prompt/response JSON manually.
- Gemini empty responses: Treats empty candidates as warnings; emits "unknown" verdict with guidance instead of failing.
- Debugging provider quirks: Global
--debugflag setsLLM_GUARD_DEBUG=1to log raw payloads whenever parsing fails.
Contributions that extend the experiment or harden the CLI are welcome.
- Follow the Rust conventions captured in
AGENTS.md. - Include automated tests for new logic (unit, integration, or snapshots as appropriate).
- Run
cargo fmt,cargo lint, andcargo test-allbefore submitting. - Document notable AI-assisted workflows or trade-offs in the PR description.
- For new detection rules, update
rules/, add fixtures, and note expected risk scores.
MIT License — see LICENSE for full text.
- Treat risk scores as decision support, not absolute truth.
- Use LLM-Guard as one layer within a defence-in-depth strategy.
- Combine with input sanitisation, rate limiting, monitoring, and human review.
- Regularly refresh rule packs and review detection results for drift.
- GPT-5 Codex (via Codex CLI) generated most code.
- Claude Code contributed reviews, docs, and feasibility checks.
- Humans reviewed security-critical paths, tests, and release readiness.
- Perform your own review, threat modelling, and tuning before production use.
AI Coding Accelerator Hackathon
- Course: Maven’s AI Coding Accelerator
- Instructors: Vignesh Mohankumar, Jason Liu
Tools & Technologies
- AI agents: Codex CLI (OpenAI), Claude Code (Anthropic)
- MCP servers: RepoPrompt, Context7
- IDE: Cursor
- Research: Perplexity
- Git client: Tower