|
| 1 | +# ProcPlan – GPU Resource Planner |
| 2 | + |
| 3 | +<img src="screenshot.png" alt="ProcPlan calendar view" width="720" /> |
| 4 | + |
| 5 | +ProcPlan is a dependency-free GPU resource planner built entirely on the Python |
| 6 | +standard library and SQLite. It ships with: |
| 7 | + |
| 8 | +- **Visual daily planner** – drag across the GPU × day grid to reserve time with |
| 9 | + priority badges and conflict checks. |
| 10 | +- **Command-line snapshots** – `procplan-cli` summarizes availability by GPU and day. |
| 11 | +- **Early finish notifier** – `procplan-notify` (or a single Python import) |
| 12 | + releases slots the moment experiments complete. |
| 13 | +- **Container + scripts** – run locally or via Docker/Podman with one command. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Quick Start |
| 18 | + |
| 19 | +```bash |
| 20 | +pip install git+https://github.com/ulvgard/procplan.git |
| 21 | +# or install from a local clone: pip install . |
| 22 | +# Both expose procplan-cli, procplan-server, procplan-notify |
| 23 | +python -m procplan.server \ |
| 24 | + --config config.json \ |
| 25 | + --database data/procplan.db \ |
| 26 | + --host 0.0.0.0 \ |
| 27 | + --port 8080 |
| 28 | +``` |
| 29 | + |
| 30 | +Open the web UI at `http://localhost:8080/`, select **All nodes**, and drag over |
| 31 | +free cells to create bookings. |
| 32 | + |
| 33 | +Prefer containers? |
| 34 | +```bash |
| 35 | +docker compose up --build |
| 36 | +``` |
| 37 | +Volumes map `config.sample.json` into the container and persist state under |
| 38 | +`./data`. |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## CLI in 30 Seconds |
| 43 | + |
| 44 | +```bash |
| 45 | +procplan-cli --url http://localhost:8080 --all # weekly view for every node |
| 46 | +procplan-cli --url http://localhost:8080 # defaults to your host name |
| 47 | +procplan-cli --url http://localhost:8080 --node node-a --date 2024-06-01 |
| 48 | +``` |
| 49 | + |
| 50 | +Example output: |
| 51 | + |
| 52 | +``` |
| 53 | +Node Training Node A (node-a) |
| 54 | +Window: 2024-06-01T00:00:00+00:00 – 2024-06-08T00:00:00+00:00 |
| 55 | +GPU | 2024-06-01 | 2024-06-02 | 2024-06-03 | 2024-06-04 | 2024-06-05 | 2024-06-06 | 2024-06-07 |
| 56 | +------------------------------------------------------------------------------------------------------------ |
| 57 | +node-a-gpu0 (A100) | AB (High) | - | - | CD (Medium)| - | - | - |
| 58 | +node-a-gpu1 (A100) | - | - | EF (Low) | - | - | - | - |
| 59 | +node-a-gpu2 (A100) | - | - | - | - | AB (High) | - | - |
| 60 | +node-a-gpu3 (A100) | - | - | - | - | - | - | - |
| 61 | +``` |
| 62 | +Columns are days, rows are GPUs, and each cell shows the booking initials with |
| 63 | +priority. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## Notify and Release Early |
| 68 | + |
| 69 | +### Python helper |
| 70 | +```python |
| 71 | +from procplan.notifier import signal_completion |
| 72 | + |
| 73 | +result = signal_completion("http://scheduler.internal:8080", booking_id=42) |
| 74 | +if not result.ok: |
| 75 | + print(f"Unable to notify scheduler ({result.status}): {result.message}") |
| 76 | +``` |
| 77 | + |
| 78 | +### CLI helper |
| 79 | +```bash |
| 80 | +procplan-notify --url http://scheduler.internal:8080 --booking-id 42 |
| 81 | +``` |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## HTTP API Highlights |
| 86 | + |
| 87 | +| Method | Path | Description | |
| 88 | +|--------|--------------------|-----------------------------------------------------| |
| 89 | +| GET | `/api/nodes` | List configured nodes and their GPUs. | |
| 90 | +| GET | `/api/availability` | Per-hour availability (ISO 8601 timestamps). | |
| 91 | +| POST | `/api/book` | Create a booking (specify `gpu_ids` or `gpu_count`). | |
| 92 | +| POST | `/api/mark_done` | Mark an active booking complete. | |
| 93 | +| POST | `/api/reload_config` | Reload node topology from disk. | |
| 94 | + |
| 95 | +All timestamps are UTC and snap to whole hours. |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## Getting Started (Deep Dive) |
| 100 | + |
| 101 | +1. **Describe your cluster** in JSON. `config.sample.json` is a template: |
| 102 | + ```json |
| 103 | + { |
| 104 | + "nodes": [ |
| 105 | + { |
| 106 | + "id": "node-a", |
| 107 | + "name": "Training Node A", |
| 108 | + "gpus": [ |
| 109 | + {"id": "node-a-gpu0", "kind": "A100"}, |
| 110 | + {"id": "node-a-gpu1", "kind": "A100"} |
| 111 | + ] |
| 112 | + } |
| 113 | + ] |
| 114 | + } |
| 115 | + ``` |
| 116 | + |
| 117 | +2. **Serve ProcPlan** with the command from *Quick Start* or via `docker compose`. |
| 118 | + The server creates the SQLite database automatically. Call `POST /api/reload_config` |
| 119 | + to pick up topology changes without downtime. |
| 120 | + |
| 121 | +3. **Book GPUs** through the web UI or CLI. Bookings require initials, duration, |
| 122 | + and either specific GPU IDs or a count. Drag-selecting in the UI fills these |
| 123 | + automatically. |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +## Development Notes |
| 128 | + |
| 129 | +- Requires **Python 3.11+** (built and tested on Python 3.13). |
| 130 | +- Uses **only the Python standard library**; no third-party dependencies. |
| 131 | +- SQLite operates in WAL mode for concurrency; the database file lives wherever |
| 132 | + you point `--database`. |
| 133 | +- Static assets ship inside `procplan/web`; override with `--web-root` if needed. |
| 134 | + |
| 135 | +ProcPlan v1.0.0 is ready for teams who want reliable GPU scheduling without a |
| 136 | +stack of services. Happy planning! |
0 commit comments