This repository is a fork of the pqmx repository, adding further examples from public-key cryptography (both classical and post-quantum). Along with pqax and SLOTHY, it accompanies the following papers:
- Fast and Clean: Auditable high-performance assembly via constraint-solving by Amin Abdulrahman, Hanno Becker, Matthias J. Kannwischer and Fabien Klein (TCHES 2024)
- Enabling Microarchitectural Agility: Taking ML-KEM & ML-DSA from Cortex-M4 to M7 with SLOTHY by Amin Abdulrahman, Matthias J. Kannwischer and Thing-Han Lim (AsiaCCS 2025)
It is intended as a complement to the well-known pqm4, which collects implementations
of post-quantum cryptography targeting Cortex-M4. This repository extends support to multiple Arm Cortex-M processors,
with particular focus on CPUs implementing the M-Profile Vector Extension
(MVE) to the Armv8.1-M architecture (also known as Arm®
Helium™ Technology), such as the Arm®
Cortex™-M55 and Arm®
Cortex™-M85 processors.
This repository also contains the source code for the SLOTHY assembly superoptimizer, discussed in the paper Fast and Clean: Auditable high-performance assembly via constraint solving. See slothy/README.md for more information.
The M-Profile Vector Extension (MVE), or Arm Helium Technology, is a Single Instruction Multiple Data (SIMD) extension for the Armv8.1-M architecture, complementing the Arm® Neon™ Advanced SIMD and Arm Scalable Vector Extension (SVE) for the Cortex-R and Cortex-A processor series.
We refer to the following resources for further information:
- Arm Helium: Enhancing the Capabilities of the Smallest Devices
- Making Helium: Why not just add NEON?: A blog series explaining the design of MVE and comparing it to Arm NEON.
- Arm v8-M Architecture Reference Manual
- MVE Reference Book
The main components of the repository are the following:
asm: Core primitives in optimized assembly, mostly auto-generated.tests: C-based tests for core primitives using a minimal hardware abstraction layer (HAL).envs: Test environments implementing the HAL.slothy: The SLOTHY assembly superoptimizer. See the README for more information.
The following sections explain each component in greater detail.
The heart of the repository are optimized assembly routines for core components of the post quantum primitives under
consideration, such as the NTT. All optimized assembly is contained in the asm directory, which is structured
as follows:
asm/manualcontains assembly that has been written by hand.asm/gen/contains a small Python 3 code generation framework, offering various helper classes for register management, loading/storing (contiguous, non-contiguous, scattered) buffers, and common assembly snippets.asm/scriptscontains code generation scripts for various algorithms around polynomial multiplication or the PQC schemes they're relevant for, as well as other tests and examples. Those scripts build on the generic framework provided byasm/gen.asm/auto/contains the assembly auto-generated by the examples inasm/scripts. Its structure mirrors that ofasm/scripts.
See asm/ for more information.
Each code generation example is accompanied by an example C-program contained in tests/. For example, the Toom4
multiplication code generators from asm/scripts/toom4 are tested in
tests/toom/.
The test files are platform-independent and only rely on a small hardware abstraction layer
tests/inc/hal.h which declares stubs for debugging, measuring, and random sources. As long as the tests get ported into an environment which defines this hardware abstraction
through a separate translation unit in the test environment, or via hal_env.h in case some or all of the HAL
functionality shall be implemented through macros. Note hal_env.h must currently always be present in the test environment, even
if the entire HAL is implemented in a separate translation unit.
For convenience, there is also a HelloWorld test with a minimal MVE assembly snippet, which can be used to test the tool setup or a new test environment.
As mentioned above, the tests from tests/ can be run in any environment defining the hardware abstraction layer
interface tests/inc/hal.h. This flexibility is useful in order to test the assembly in different models or
simulators of processor implementations.
The supported test environments are located in envs.
The repository supports multiple test environments for different Arm Cortex-M processors:
Primary platforms:
- Arm® Corstone™ SSE-300 with Cortex®-M55 and Ethos™-U55 (AN547) - emulatable with qemu (>=6.0)
- Arm® Corstone™ SSE-310 with Cortex®-M85 and Ethos™-U55 (AN555)
- Renesas EK-RA8M1 (Cortex-M85) development board - physical hardware platform
Additional supported platforms:
- Arm® MPS2 with Cortex®-M4 (AN386)
- Arm® MPS2 with Cortex®-M7 (AN500)
- STM32 Nucleo-F767ZI (Cortex-M7) development board
- STM32F4-Discovery (Cortex-M4) development board
Previously, the freely available FVPs for the Arm® Corstone™-300 MPS2 and Arm® Corstone™-300 MPS3 were also supported. However, these are currently no longer maintained (see #7).
Writing a new test environment requires the provisioning of build, run and debug scripts, plus an implementation of the
test HAL tests/inc/hal.h.
If you have added a new test
environment, you can test that it works against the HelloWorld test in tests/helloworld.
To run the tests in qemu, the target run-m55-an547_{test_name} can be used. It will build the executable from the sources and run it using qemu-system-arm -M mps3-an547 -nographic -semihosting -kernel.
For the EK-RA8M1 hardware board, use flash-ek-ra8m1-{test_name} to flash the test and then connect via telnet to view output (see envs/ek-ra8m1/README.md for detailed setup instructions).
The software is provided under an MIT license. Contributions to this project are accepted under the same license.
All the development and build dependencies are specified in flake.nix. We recommend installing them using nix.
- Setup with nix
-
Running
nix developwill execute a bash shell with the development environment specified in flake.nix. -
Alternatively, you can enable
direnvby usingdirenv allow, allowing it to handle the environment setup for you. -
As flake is still an experimental feature of nix,
--experimental-features 'nix-command flakes'is needed when running the nix command. Alternatively, add the following to your~/.config/nix/nix.confor/etc/nix/nix.conf:
-
experimental-features = nix-command flakes
- If you are not using nix, please ensure you have installed the same versions as specified in flake.nix.
The code in this repository can then be generated, compiled and run via make:
make {build,run,flash}-{platform}-{test_name}builds/runs/flashes the chosen test in the chosen test environment.
Available platforms: m55-an547, m85-an555, ek-ra8m1, m4-an386, m7-an500, nucleo-f767zi, stm32f4discovery
Example tests: helloworld, ntt-kyber, ntt-dilithium, sqmag, karatsuba, montgomery, keccak
We recommend trying
make run-m55-an547_helloworld
after setting up the required tooling, to check that the tools are in the right place and working as expected.
On MacOS+zshrc, add the following to your .zshrc to support autocompletion with make:
zstyle ':completion::complete:make:*:targets' call-command true
autoload -U compinit && compinit