This repository is a teaching template that demonstrates how to build, run, and compare empirical ML experiments using Hydra. It shows students how to keep configs, code, and results organized so ideas can be tested quickly and reproduced later.
Structuring experiments up front reduces “glue code”, makes every run reproducible, and keeps baselines and new ideas directly comparable. The same layout scales from a laptop to large parallel executions (joblib locally or Slurm on HPC) without refactoring. A clean layout also helps collaborators (and future you) understand what was run, with which settings, and where the results live.
- What exact question am I answering, and which metrics define success?
- Which datasets, models, and hyperparameters will vary?
- What baselines will I compare against?
- How will I ensure repeatability (seeds, logged configs, versioned data/code)?
- Where will results be stored and how will they be aggregated?
- What compute/launcher do I need (local, joblib, Slurm, GPU)?
config/— Hydra configs split by domain (model, data, training, launcher, experiments). Add new models by creating a new YAML underconfig/model/, e.g.,net_bn.yamlpointing to a class inmodules/models/.modules/— Python source. Put new model code inmodules/models/, datasets inmodules/datasets/, training loops inmodules/training/, shared utilities inmodules/utils/.runs/— CLI entrypoints (tasks). Keep each logical task here (e.g.,train.py,report.py).data/— downloaded datasets or artifacts.run_all_tasks.*— convenience scripts to chain tasks.env_setup/— environment files (Dockerfile, requirements).- Outputs — Hydra writes under
outputs/...; trained models andresult.jsonper run go undermodels/...(seeconfig/path/relative.yaml).
If you add a new model: implement it in modules/models/my_model.py, expose it via _target_ in a new config/model/my_model.yaml, then reference it on the CLI with model=my_model.
Why structure experiments first?
- Removes ambiguity: goals, metrics, and success criteria are written down before coding.
- Reproducibility by default: every run records the exact config snapshot Hydra used.
- Faster iteration: you can sweep parameters or models with one CLI call instead of editing code.
- Comparability: baselines and new ideas share the same data/metrics pipeline.
What to consider
- Baselines: start with
modules/models/simple_net.py(config/model/net2.yaml) and compare to the BatchNorm variantmodules/models/simple_net_bn.py(config/model/net_bn.yaml). - Repeatability: set
seedinconfig/train_model.yaml; Hydra stores the resolved config per run. - Ease of use & readability: prefer small, composable YAML files; override via CLI instead of editing Python.
- A config is a declarative YAML describing how to build an object. Hydra injects
_target_to map config → Python class. - Example models:
config/model/net2.yamlobject: _target_: modules.models.simple_net.Net num_layers: 2 latent_dim: 128
config/model/net_bn.yamlobject: _target_: modules.models.simple_net_bn.NetBN num_layers: 2 latent_dim: 128 dropout: 0.3
- Instantiation happens inside
runs/train.py:train_loader = hydra.utils.instantiate(cfg.data.dataloaders.train) test_loader = hydra.utils.instantiate(cfg.data.dataloaders.test) model = hydra.utils.instantiate(cfg.model.object).to(device)
- Equivalent manual Python (without Hydra) for the BatchNorm variant:
from modules.models.simple_net_bn import NetBN model = NetBN(num_layers=2, latent_dim=128, dropout=0.3) model = model.to(device)
- A run = one merged config group (model + data + training + path + optional launcher).
runs/train.pysequence: seed setup → instantiate dataloaders → instantiate model → train/evaluate → save checkpoint +result.json.- Outputs go to
${path.base_path}/outputs/...and${path.base_path_models}/...(seeconfig/path/relative.yaml).
- Create environment (choose one):
# Conda conda create --prefix ./.venv python=3.12.3 conda activate ./.venv pip install -r env_setup/requirements.txt # venv python -m venv .venv source .venv/bin/activate # Linux/Mac .venv\Scripts\activate # Windows pip install -r env_setup/requirements.txt
- Default run:
python runs/train.py
- Override on the fly:
python runs/train.py model=net_bn training.epochs=3 seed=123
- Change layers without touching code:
python runs/train.py model.object.num_layers=5
- Multirun example:
python runs/train.py --multirun model=net2,net_bn seed=0,1,2 training.epochs=2
- Preset sweep (
config/experiment/sweep_models.yaml):(Edit the YAML to addpython runs/train.py +experiment=sweep_models
net_bnif you want it included.) - Example sweep YAML (what it does):
Runs 3 models × 5 seeds = 15 jobs, each with 3 epochs, storing one config/output folder per job.
# config/experiment/sweep_models.yaml defaults: - override /training: basic hydra: mode: MULTIRUN sweeper: params: model: net2,net5,net7 # try three model depths seed: range(0,5) # run seeds 0–4 for robustness training.epochs: 3 # fix epochs for all runs
- Hydra swaps launch backends via the
launcherconfig group. Select with+launcher=.... - Local parallel jobs:
+launcher=joblibsplits multirun work across CPU cores. - Slurm CPU:
+launcher=slurm - Slurm GPU example (
config/launcher/slurmgpu.yaml):defaults: - override /hydra/launcher: submitit_slurm hydra: callbacks: log_job_return: _target_: hydra.experimental.callbacks.LogJobReturnCallback launcher: setup: - "module load Python/3.12.3 2>&1" - "module load CUDA/12.6.3 2>&1" - ". .venv/bin/activate" - "nvidia-smi" - "python -m torch.utils.collect_env" submitit_folder: ${hydra.sweep.dir}/.submitit/%j cpus_per_task: 20 # CPU cores per job gpus_per_node: 1 # request one GPU gres: "gpu:1" # Slurm gres string tasks_per_node: 1 array_parallelism: 50 # how many array jobs run in parallel timeout_min: 30 # walltime per job
- What it does: running
python runs/train.py --multirun +launcher=slurmgpusubmits a Slurm array via SubmitIt; each job loads modules, activates the venv, collects env info, and trains on one GPU. Logs/config snapshots stay underoutputs/...and.submitit/....
- Sweep over models:
Hydra/SubmitIt fans out jobs; each run writes its merged config, checkpoint, and
python runs/train.py --multirun model=net2,net_bn seed=0,1,2 training.epochs=2
result.jsoninto its own output folder. - Generate the report:
The reporter uses helper aggregator utilities (
python runs/report.py base_dir=./models
modules/utils/aggregator.py) to load everyresult.json, normalize to a DataFrame, compute mean/±std, and emitresults_table.csvplusresults_plot.png. - One-click reproducibility:
./run_all_tasks.sh(Linux/Mac) orrun_all_tasks.bat(Windows) chain the same sweep-and-report steps, so anyone can rerun the complete pipeline end-to-end—locally or at cluster scale—without editing code.
- After runs finish, collect metrics into tables/plots:
python runs/report.py base_dir=./models
- Output:
results_table.csv,results_plot.pnginreports/..., showing means and ±std across models/seeds so you can pick winners (e.g.,simple_netvssimple_net_bn).
- Default naming in
config/train_model.yaml:${data.name}_${model.name}_${training.name}_${seed}ensures one folder per parameter group. - To further disambiguate, append short suffixes via CLI:
suffix=_try1or overridename=mnist_net_bn_lr1e-3_s123. - Keep names deterministic (include model, key hyperparams, and seed) so aggregations cleanly group runs; avoid spaces or ambiguous labels.
- Build the image (uses
env_setup/Dockerfile):docker build -t example-pipeline ./env_setup
- Run with project mounted:
docker run --rm -it -v $(pwd):/workspace -w /workspace example-pipeline bash - Why: identical environment on laptop, server, or CI; GPU pass-through works with
--gpus allwhen the host has NVIDIA runtime.
- Configure remote once:
rclone config
- Sync datasets up/down (progress + parallel transfers):
rclone sync ./data/datasets remote:bucket/path -P --transfers=8
- Mirror results to cloud for backup/sharing:
rclone sync ./models remote:bucket/experiments/models -P rclone sync ./outputs remote:bucket/experiments/outputs -P
- Tip: keep large artifacts out of git; use rclone to stage them where your Slurm jobs can read them.
- Hydra documentation: https://hydra.cc/
- SubmitIt (Hydra launcher backend): https://github.com/facebookincubator/submitit
- Reproducibility in ML: https://www.nature.com/articles/s42256-019-0035-4
- Ten Simple Rules for Reproducible Research: https://doi.org/10.1371/journal.pcbi.1003285
- Experiment tracking tools overview: https://neptune.ai/blog/ml-experiment-tracking-tools
- Docker docs: https://docs.docker.com/
- rclone docs: https://rclone.org/