Forklift is a UCI-compatible chess engine implemented in .NET. This repository contains the core engine, benchmarking harnesses, testing utilities, and automation to help evaluate strength improvements over time.
The repository ships a GitHub Actions workflow named Elo Evaluation (.github/workflows/elo-eval.yml) that automates nightly and on-push engine self-play using cutechess-cli and Ordo.
- Manual trigger: Visit the repository's Actions tab, choose Elo Evaluation, and click Run workflow to start an immediate match between the checked-out commit and the latest commit on
main(or the previous commit if you are already onmain). - Automatic triggers: The workflow also runs on every push to
mainand nightly at 03:00 UTC.
- Two self-contained Linux binaries (
artifacts/current/engineandartifacts/previous/engine) built from the current commit and the baseline commit taken frommain. - A cutechess match of at least 100 games (or early SPRT stop) using a curated openings set and UCI protocol.
- Ordo rating estimates anchored to the baseline build for easier trend tracking.
- Uploaded artifacts containing:
matches/latest-vs-previous.pgn(all played games)matches/logs/(cutechess logs and raw output)matches/ratings.txtandmatches/ratings.csv(Ordo summaries)matches/tool-versions.txt(exact cutechess-cli and Ordo versions)
Artifacts are accessible from the workflow run summary in GitHub Actions. Download them to inspect individual games or the rating tables.
- SPRT: Pass a different
-Sprtargument toscripts/elo-eval.ps1(or tweak the workflow invocation) to tighten or loosen the acceptance bounds. For example, increaseelo1to demand a larger Elo gain before early acceptance. - Time control: Supply a custom
-TimeControlargument such as'3+1'for slower games. The workflow defaults to1+0.1blitz time but any cutechess-compatible control works. - Openings: Replace
matches/openings.epdwith a richer book or point the script at another file via the-OpeningsFileparameter. The PowerShell helper will download a lightweight suite if the file is missing.
- SPRT status in the job summary indicates whether the test met its stop conditions (accept, reject, or continue).
- Ordo's Elo difference is only as reliable as the number of games and the draw rate—expect wide error bars for small sample sizes. The workflow warns when fewer than 10 games are played.
- Because the workflow anchors the baseline build at 2500 Elo, the reported ratings are relative changes across runs rather than absolute playing strength.
The same PowerShell script that powers CI also runs locally on Windows, macOS, and Linux. A few common invocations:
# Compare HEAD against its parent using the defaults
pwsh ./scripts/elo-eval.ps1 -VsParent -Games 300 -TimeControl '1+0.1'
# Compare against a specific reference (branch, tag, or commit)
pwsh ./scripts/elo-eval.ps1 -VsRef v0.3.1 -Games 400 -TimeControl '2+0.2'
# On Linux, reuse the CI installer for cutechess-cli/Ordo via apt
$env:USE_APT_CUTECHESS = '1'
pwsh ./scripts/elo-eval.ps1 -VsParentOn Linux CI the script automatically calls ./scripts/install-chess-tools.sh; on Windows you can add -InstallTools to attempt downloading a portable cutechess build if one is not already on PATH. Artifacts land under matches/, mirroring the GitHub Action layout.
For local experimentation, run ./scripts/install-chess-tools.sh (Linux/macOS) or pwsh ./scripts/elo-eval.ps1 -InstallTools (Windows) to provision cutechess-cli and Ordo before executing a match.