A comprehensive toolkit for optimizing code generation prompts using 8 different prompt engineering techniques.
This project implements and benchmarks various prompt optimization methods for improving LLM-based code generation. All methods are designed to work with Ollama for local, private, and free LLM inference.
- Features
- Methods Implemented
- Installation
- Quick Start
- Usage Examples
- Benchmarking
- Project Structure
- API Reference
- Contributing
- License
- ๐งฌ 8 Prompt Engineering Methods - From genetic algorithms to reinforcement learning
- ๐ง Modular Architecture - Easy to extend with new methods
- ๐ Built-in Benchmarking - Compare all methods side-by-side
- ๐ Code Quality Evaluation - Syntax, functionality, readability metrics
- ๐ฅ๏ธ CLI Interface - Easy command-line usage
- ๐ Fitness Tracking - Monitor optimization progress
- ๐ Local LLM Support - Works with Ollama for privacy
| Method | Description | Type |
|---|---|---|
| AutoPrompt-GA | Genetic algorithm-based prompt evolution | Evolutionary |
| OPRO | Optimization by PROmpting (DeepMind) | Meta-prompting |
| Chain-of-Thought | Step-by-step reasoning chains | Reasoning |
| Tree-of-Thought | Tree-based exploration of reasoning paths | Search |
| Few-Shot | Example-based learning | In-context Learning |
| Prompt Tuning | Soft prompt optimization | Tuning |
| Prefix Tuning | Prefix-based prompt optimization | Tuning |
| Prompt-OIRL | Q-Learning based optimization | Reinforcement Learning |
๐งฌ AutoPrompt-GA (Genetic Algorithm)
Uses evolutionary principles to optimize prompts:
- Selection: Tournament selection of best prompts
- Crossover: Combines successful prompt elements
- Mutation: Random modifications for exploration
- Elitism: Preserves top performers
๐ฏ OPRO (Optimization by PROmpting)
MetaPrompting approach where the LLM itself suggests prompt improvements:
- Analyzes current prompt performance
- Generates improvement suggestions
- Iteratively refines based on feedback
๐ Chain-of-Thought (CoT)
Encourages step-by-step reasoning:
- Problem analysis โ Algorithm design โ Implementation โ Verification
- Reduces errors through structured thinking
๐ณ Tree-of-Thought (ToT)
Explores multiple reasoning paths simultaneously:
- Branching factor for exploring alternatives
- Beam search for pruning unpromising paths
- Optimal path selection based on evaluation
๐ Few-Shot Learning
Provides examples of good code:
- Demonstrates expected output format
- Shows coding style and patterns
- Context-based learning
โ๏ธ Prompt Tuning
Optimizes soft prompt tokens:
- Explores combinations of descriptive adjectives
- Finds optimal prompt formulations
- Task-specific tuning
๐ง Prefix Tuning
Optimizes prompt prefixes:
- Tests different expert personas
- Finds effective prompt structures
- Role-based optimization
๐ฎ Prompt-OIRL (Reinforcement Learning)
Q-Learning based optimization:
- State space: Code quality levels
- Action space: Prompt modifications
- Reward: Score improvement
- Exploration vs exploitation balance
- Python 3.8+
- Ollama installed and running
# Clone the repository
git clone https://github.com/kadiryonak/PromptEngineering_Methods.git
cd PromptEngineering_Methods
# Install dependencies
pip install -r requirements.txt
# Pull a code-focused model (recommended)
ollama pull codellama:7b
# Start Ollama server (if not running)
ollama servecd prompt_engineering_methods
# List all available methods
python main.py --list-methods
# Run all methods with 3 generations each
python main.py --method all --generations 3
# Run a specific method
python main.py --method autoprompt --generations 5
# Save results to JSON
python main.py --method all --output results.jsonfrom ollama_client import OllamaClient
from prompt_engineering import AutoPromptCodeGA, ChainOfThoughtCode
# Initialize client
client = OllamaClient(model="codellama:7b")
# Sample dataset
dataset = [
{
"task": "Write a function to calculate factorial",
"test_cases": [
{"function": "factorial", "input": [5], "expected": 120}
]
}
]
# Run AutoPrompt-GA
optimizer = AutoPromptCodeGA(client, dataset)
best_prompt, score = optimizer.optimize(generations=5)
print(f"Best prompt: {best_prompt}")
print(f"Score: {score:.4f}")from prompt_engineering.tree_of_thought import TreeOfThoughtCode
# Initialize with custom parameters
tot = TreeOfThoughtCode(
client,
dataset,
branching_factor=3, # Explore 3 alternatives at each node
max_depth=3 # Tree depth
)
best_prompt, score = tot.optimize(generations=4)from code_evaluator import CodeQualityEvaluator
evaluator = CodeQualityEvaluator()
code = """
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
"""
# Check syntax
is_valid, message = evaluator.syntax_check(code)
print(f"Syntax valid: {is_valid}")
# Get quality metrics
metrics = evaluator.code_quality_metrics(code)
print(f"Readability: {metrics['readability']:.2f}")
print(f"Best practices: {metrics['best_practices']:.2f}")Run comprehensive benchmarks comparing all methods:
# Run benchmark
python benchmark.py --generations 3
# Save results
python benchmark.py --output results.json --report report.md| Rank | Method | Best Score | Avg Score | Conv. Speed | Time (s) |
|---|---|---|---|---|---|
| 1 | AutoPrompt-GA | 0.7850 | 0.6234 | 3 | 45.2 |
| 2 | Tree-of-Thought | 0.7623 | 0.6012 | 4 | 52.1 |
| 3 | OPRO | 0.7412 | 0.5891 | 5 | 38.7 |
| 4 | Chain-of-Thought | 0.7234 | 0.5678 | 2 | 28.3 |
| 5 | Few-Shot | 0.7012 | 0.5543 | 1 | 22.1 |
PromptEngineering_Methods/
โโโ prompt_engineering_methods/
โ โโโ main.py # CLI entry point
โ โโโ benchmark.py # Benchmarking system
โ โโโ ollama_client.py # Ollama API client
โ โโโ code_evaluator.py # Code quality evaluation
โ โโโ base_optimizer.py # Base optimizer class
โ โโโ metrics.py # Evaluation metrics
โ โโโ prompt_engineering/
โ โโโ __init__.py
โ โโโ autoprompt_ga.py # Genetic Algorithm
โ โโโ opro.py # OPRO method
โ โโโ chain_of_thought.py # Chain-of-Thought
โ โโโ tree_of_thought.py # Tree-of-Thought
โ โโโ few_shot.py # Few-Shot Learning
โ โโโ prompt_tuning.py # Prompt Tuning
โ โโโ prefix_tuning.py # Prefix Tuning
โ โโโ prompt_OIRL.py # RL-based optimization
โโโ requirements.txt
โโโ .gitignore
โโโ README.md
client = OllamaClient(
base_url="http://localhost:11434",
model="codellama:7b"
)
# Generate code
response = client.generate(prompt, max_tokens=300, temperature=0.1)
# Check availability
is_running = client.is_available()All methods inherit from this base class:
class YourMethod(BaseCodePromptOptimizer):
def __init__(self, ollama_client, dataset, name):
super().__init__(ollama_client, dataset, name)
def optimize(self, generations=5) -> Tuple[str, float]:
# Your optimization logic
return best_prompt, best_scoreevaluator = CodeQualityEvaluator()
# Extract code from LLM response
code = evaluator.extract_code_block(text)
# Syntax validation
is_valid, message = evaluator.syntax_check(code)
# Functional testing
score = evaluator.functional_test(code, test_cases)
# Quality metrics
metrics = evaluator.code_quality_metrics(code)
# Returns: readability, complexity, documentation, best_practicesContributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-method) - Implement your changes
- Test thoroughly
- Submit a pull request
- Create a new file in
prompt_engineering/ - Inherit from
BaseCodePromptOptimizer - Implement the
optimize()method - Add to
__init__.pyexports - Register in
main.pyMETHODS dict
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama for local LLM inference
- DeepMind OPRO Paper for optimization methodology
- Tree of Thoughts Paper for reasoning framework
Made with โค๏ธ by kadiryonak