Welcome to Assignment 2!
This is your team's codebase for exploring Particle Swarm Optimization (PSO) through Multi-Agent Reinforcement Learning. Instead of using fixed PSO parameters, you'll train agents (particles) to learn optimal optimization strategies through interaction and rewards. Your swarm should cooperate to find global optima while adapting to dynamic, shifting landscapes!
Don't worry if it seems complex at first, this README will walk you through everything step by step!
- What Is This Project?
- The Environment
- Quick Start
- Your Assignment
- Available Experiments
- Understanding the Code
- Configuration & Customization
- Experimental Results
- Metrics & Evaluation
- Visualization
- Troubleshooting & Tips
- Technical Details
- References
Imagine you're optimizing a complex function with many hills and valleys. Traditional PSO uses fixed formulas - but what if the particles could learn when to explore new areas vs. exploit known good regions? What if they could adapt to a landscape that changes over time?
This is learned PSO: each particle is an RL agent that outputs its own inertia, cognitive, and social coefficients based on what it observes.
- Particles: Agents moving through a continuous search space
- Velocity: Each particle has momentum, influenced by:
- Inertia: How much to keep going in the same direction
- Cognitive: Attraction to personal best position
- Social: Attraction to neighbors' positions
- Goal: Find the global minimum of objective functions (Sphere, Rastrigin, etc.)
- Learned Parameters: Instead of fixed coefficients, agents learn optimal values
- Multi-Agent Cooperation: Particles share information through neighborhoods
- Dynamic Landscapes: Optima can move over time, requiring adaptation
- CTDE Training: Centralized critic for stable training, decentralized execution
- TorchRL: Modern RL framework for multi-agent training
- PyTorch: Neural network backend
- Hydra: Configuration management
- Matplotlib: Visualization and animation
The environment simulates a particle swarm optimizing a function:
Episode Flow:
1. Particles spawn at random positions in search space
2. Each timestep:
- Each particle observes its state (position, velocity, bests)
- Policy outputs PSO coefficients (inertia, cognitive, social)
- Velocities and positions are updated
- Fitness is evaluated on objective function
- Personal and global bests are updated
3. Episode ends after max_steps
4. Reward based on fitness improvement
Each particle receives observations about its state:
| Field | Shape | Description |
|---|---|---|
| positions | [agents, dim] |
Current position in search space |
| velocities | [agents, dim] |
Current velocity vector |
| scores | [agents] |
Current fitness value (negated, higher=better) |
| personal_best_pos | [agents, dim] |
Best position found by this particle |
| personal_best_scores | [agents] |
Fitness at personal best |
| avg_pos | [agents, dim] |
Mean position of neighbors within δ radius |
| avg_vel | [agents, dim] |
Mean velocity of neighbors within δ radius |
Each particle outputs 3 coefficient vectors:
| Action | Shape | Description | Typical Range |
|---|---|---|---|
| inertia | [agents, dim] |
Weight for previous velocity | 0.3 - 1.1 |
| cognitive | [agents, dim] |
Weight for personal best attraction | 0.5 - 2.5 |
| social | [agents, dim] |
Weight for social/neighbor attraction | 0.5 - 2.5 |
velocity = inertia * velocity
+ cognitive * (personal_best_pos - position)
+ social * avg_neighbor_pos
position = position + velocityWe use PPO (Proximal Policy Optimization) with:
- Centralized Critic: Sees all particles' states during training
- Decentralized Actors: Each particle acts on its own observations
- Shared Parameters: All particles share the same policy network
- GAE: Generalized Advantage Estimation for stable learning
- Docker (recommended) OR Python 3.12+
- GPU (optional but faster)
- ~2GB disk space for Docker image
# 1. Clone the repo
git clone https://github.com/elte-collective-intelligence/student-particle-swarm-optimization.git
cd student-particle-swarm-optimization
# 2. Build Docker image (one-time setup, ~3 minutes)
docker build -f docker/Dockerfile -t student_pso .
# 3. Run a quick training experiment (~1 minute)
docker run --rm --gpus=all \
-v $(pwd):/app \
student_pso \
python src/main.py --config-path configs/experiments --config-name smoke_trainIf you see training logs and "Training Complete!", you're good to go!
The smoke_train experiment:
- Created a 2D sphere function landscape
- Spawned 5 particles
- Trained for 10 iterations (~5k frames)
- Saved results to
src/outputs/smoke_train/
# Install dependencies
pip install -r requirements.txt
# Run experiment locally
python src/main.py --config-path configs/experiments --config-name smoke_train
# Or use the script
./scripts/run_experiment.sh smoke_train# Run tests (23 tests, should all pass)
pytest test/ -v
# Quick training check
python src/main.py n_iters=5 frames_per_batch=256Each team will receive a specific task focusing on different aspects of the PSO system. Your task will involve:
- Implementing a specific feature or modification
- Running experiments to evaluate your changes
- Analyzing results with ablation studies
- Writing a report documenting your findings
- Reward Shaping: Design rewards for diversity, anti-collapse, exploration
- Dynamic Landscapes: Adapt to moving optima
- Communication Topologies: Compare gBest vs lBest neighborhoods
- Role Emergence: Do particles specialize (scouts vs exploiters)?
- Curriculum Learning: Easy → hard functions, low → high dimensions
The codebase comes with pre-configured experiments. Start with smoke_train, then customize.
| Name | Agents | Dims | Function | Iterations | Purpose |
|---|---|---|---|---|---|
smoke_train |
5 | 2D | sphere | 10 | Quick sanity check (~1 min) |
full_train |
20 | 5D | sphere | 100 | Full training (~10 min) |
rastrigin_train |
10 | 2D | rastrigin | 50 | Multimodal function |
dynamic_train |
10 | 2D | dynamic_sphere | 50 | Moving optimum |
eval_vis |
10 | 2D | sphere | - | Visualization only |
With Docker:
docker run --rm --gpus=all \
-v $(pwd):/app \
student_pso \
python src/main.py --config-path configs/experiments --config-name smoke_trainWith Scripts:
# Single experiment
./scripts/run_experiment.sh smoke_train
# All experiments
./scripts/train_all.shLocally:
python src/main.py --config-path configs/experiments --config-name smoke_trainsrc/outputs/<experiment_name>/
├── training_results.png # Training curves
├── policy.pt # Saved policy network
├── critic.pt # Saved critic network
└── .hydra/ # Hydra config logs
student-particle-swarm-optimization/
├── src/
│ ├── main.py # Training entry point (START HERE)
│ ├── eval.py # Evaluation & visualization script
│ ├── visualization.py # 2D/3D swarm animations (427 lines)
│ ├── utils.py # Wrappers & action extraction
│ ├── README.md # Source code overview
│ ├── envs/
│ │ ├── README.md # Environment documentation
│ │ ├── env.py # PSO environment (241 lines)
│ │ └── dynamic_functions.py # Dynamic landscapes (188 lines)
│ └── configs/
│ ├── README.md # Configuration guide
│ ├── config.yaml # Default training config
│ ├── eval_config.yaml # Evaluation config
│ ├── env/ # Environment settings
│ ├── model/ # PPO hyperparameters
│ ├── experiments/ # Pre-defined experiments
│ └── visualization/ # Visualization options
├── test/
│ ├── README.md # Test documentation
│ └── test_env.py # Environment tests (23 tests)
├── scripts/
│ ├── README.md # Scripts documentation
│ ├── run_experiment.sh # Run single experiment
│ ├── train_all.sh # Run all training
│ └── eval_model.sh # Evaluate trained model
├── docker/
│ └── Dockerfile # Docker container
├── images/ # Training result plots
├── requirements.txt # Python dependencies
└── README.md # You are here!
Every directory has a comprehensive README! Each explains:
- What each file does
- How components interact
- Usage examples
- Tips for students
For most assignments, you'll primarily work with:
-
src/envs/env.py(PSO Environment)- Particle dynamics and state updates
- Neighborhood calculations
- Reward computation
- Read
src/envs/README.mdfor details
-
src/main.py(Training Loop)- PPO training with GAE
- Action transformation for proper PSO ranges
- Model saving and logging
-
src/utils.py(Utilities)PSOActionExtractor: Transforms network outputs to PSO coefficientsLandscapeWrapper: Wraps objective functions
-
src/visualization.py(Visualization)- 2D/3D animated swarm visualizations
- Trajectory and convergence plots
main.py
↓
1. Load config from Hydra
↓
2. Create environment (PSOEnv)
↓
3. Initialize policy & critic (MultiAgentMLP)
↓
4. For each iteration:
↓
env.rollout() → collect trajectories
↓
compute_gae() → compute advantages
↓
For each epoch:
↓
PPO update (clip loss + value loss)
↓
Log metrics (reward, loss)
↓
5. Save models to outputs/
# src/configs/config.yaml
model:
name: ppo
hidden_sizes: [64, 64]
activation: relu
learning_rate: 0.0003
centralized_critic: true
share_params: true
env:
name: swarm
landscape_dim: 2
num_agents: 10
batch_size: 8
delta: 1.0 # Neighborhood radius
landscape_function: sphere
# Training settings
frames_per_batch: 4096
minibatch_size: 256
n_iters: 100
num_epochs: 4
clip_epsilon: 0.2
entropy_coef: 0.01
gamma: 0.99
lmbda: 0.95
# Output
output_dir: src/outputs/default
save_model: true
save_plot: true# 1. Create a new config file
cat > src/configs/experiments/my_experiment.yaml << EOF
defaults:
- ../config
- _self_
env:
num_agents: 20
landscape_function: rastrigin
landscape_dim: 5
n_iters: 100
output_dir: src/outputs/my_experiment
EOF
# 2. Run it!
python src/main.py --config-path configs/experiments --config-name my_experiment| Parameter | Effect | Recommendation |
|---|---|---|
env.num_agents |
More particles = better exploration | 10-50 |
env.delta |
Neighborhood radius | 0.5-2.0 |
model.hidden_sizes |
Network capacity | [64,64] or [128,128] |
clip_epsilon |
PPO conservatism | 0.1-0.3 |
entropy_coef |
Exploration bonus | 0.01-0.1 |
Evaluation was conducted on various benchmark functions. Below are animated visualizations showing particle swarm behavior and evaluation metrics.
| Landscape | Agents | Dims | Best Score | Final Score | Vs Random |
|---|---|---|---|---|---|
| Sphere | 10 | 2D | -0.085 | -0.085 ± 0.17 | ✅ Converges to optimum |
| Rastrigin | 10 | 2D | -0.716 | -7.64 ± 5.63 | ✅ Navigates local minima |
| Dynamic Sphere | 10 | 2D | -0.004 | -1.82 ± 1.21 | ✅ Tracks moving target |
Note: Scores are negated (higher = better). Best score shows optimal value found.
The simplest test function - particles should converge quickly to the origin.
|
2D Swarm Animation |
3D Surface View |
|
Particle Trajectories |
Convergence Plot |
Observations:
- Particles quickly identify the global optimum at origin
- Velocities decrease as swarm converges
- Final best score approaches 0 (perfect)
A challenging function with many local minima - tests exploration vs exploitation.
|
2D Swarm Animation |
Convergence Plot |
Observations:
- Particles explore multiple basins before settling
- Some particles get trapped in local minima
- Learned coefficients help escape local optima better than fixed PSO
The optimum moves in a circular path - tests adaptive tracking.
2D Swarm Animation
Observations:
- Swarm tracks the moving optimum (circular trajectory)
- Agents maintain exploration to avoid losing the target
- Demonstrates adaptation to non-stationary environments
-
Action Transformation is Critical: Raw network outputs (~0) don't work for PSO. We transform to proper ranges:
- Inertia:
0.7 + 0.2 * tanh(x)→ [0.5, 0.9] - Cognitive/Social:
1.5 + 0.5 * tanh(x)→ [1.0, 2.0]
- Inertia:
-
Neighborhood Information: The
deltaparameter controls local vs global information sharing. -
Dynamic Adaptation: On moving landscapes, agents learn to maintain exploration rather than converging prematurely.
What it measures: How much the swarm improves each step
reward = current_score - previous_score
# Higher is better (we negate functions, so minimization → maximization)What it measures: Quality of the best solution found
best_fitness = max(global_best_score over episode)What it measures: How spread out the particles are
diversity = mean_pairwise_distance(positions)
# Low diversity = premature convergence risk# Evaluate a trained model
python src/eval.py model_path=src/outputs/sphere_full/policy.pt
# With visualization
python src/eval.py model_path=src/outputs/sphere_full/policy.pt \
visualization.visualize_swarm=true \
visualization.save_gif=trueThe visualization module creates animated views of swarm behavior.
Shows particles on landscape contours with:
- Particle positions (colored dots)
- Velocity vectors (arrows)
- Personal bests (small markers)
- Global best (star)
For 2D search spaces, shows particles on the 3D surface with rotating camera.
Static plot showing paths each particle took during optimization.
Best and mean fitness over time.
python src/eval.py model_path=src/outputs/sphere_full/policy.pt \
visualization.visualize_swarm=true \
visualization.save_gif=true \
visualization.save_dir=src/outputs/vis/- Cause: Network outputs are near 0, but PSO needs specific coefficient ranges
- Solution: Ensure
transform_actions=TrueinPSOActionExtractor
- Reduce
frames_per_batchorenv.batch_size - Use smaller
model.hidden_sizes
- Reduce learning rate
- Check velocity clamping in environment
- Use gradient clipping (
max_grad_norm)
- Increase
entropy_coeffor more exploration - Use more particles (
env.num_agents) - Try larger
env.deltafor more information sharing
- Start Simple: Test on
spherebeforerastrigin - Watch Training Curves: Reward should trend upward
- Check Diversity: If particles collapse to one point, add diversity reward
- Tune Carefully: Small changes to
clip_epsilonandentropy_coefmatter
| Function | Formula | Optimum | Difficulty |
|---|---|---|---|
sphere |
f(x) = Σ x_i² |
f(0) = 0 | Easy |
rastrigin |
f(x) = 10n + Σ[x_i² - 10cos(2πx_i)] |
f(0) = 0 | Hard |
eggholder |
Complex (see code) | Known | Hard |
dynamic_sphere |
Sphere with moving center | Tracks | Medium |
dynamic_rastrigin |
Rastrigin with oscillating amplitude | Adapts | Very Hard |
- Clipped Objective:
min(r*A, clip(r, 1-ε, 1+ε)*A) - Value Loss: MSE between predicted and GAE-computed returns
- Entropy Bonus: Encourages exploration
- Gradient Clipping:
max_grad_norm=0.5
def get_neighborhood_avg(positions, velocities, delta):
# Compute pairwise distances
dist = pairwise_distance(positions)
# Mask for neighbors within delta
neighbor_mask = (dist <= delta)
# Average position/velocity of neighbors
avg_pos = masked_mean(positions, neighbor_mask)
avg_vel = masked_mean(velocities, neighbor_mask)
return avg_pos, avg_vel- Kennedy & Eberhart (1995) - Original PSO
- Shi & Eberhart (1998) - Inertia weight
- Blackwell (2007) - Dynamic PSO
# Run all tests (23 tests)
pytest test/ -v
# Run with coverage
pytest test/ --cov=src --cov-report=html
# Quick check
pytest test/test_env.py -v -x# Build image
docker build -f docker/Dockerfile -t student_pso .
# Run training
docker run --rm --gpus=all -v $(pwd):/app student_pso \
python src/main.py --config-path configs/experiments --config-name smoke_train
# Interactive shell
docker run --rm -it --gpus=all -v $(pwd):/app student_pso bashThis project is licensed under CC BY-NC-ND 4.0. See the LICENSE file for details.
TL;DR: You can use this for educational purposes but not for commercial use, and you can't redistribute modified versions without permission.
Good luck with your assignment!
Remember: Start with smoke_train to understand the system, then move to your specific task. Read the code, run experiments, and don't hesitate to ask questions!







