Arm CPU LLM Chat

This project is a Topo template and follows the Topo Template Format Specification.

Complete LLM chat application optimized for Arm CPU inference.

Features: SVE, NEON

Overview

This project demonstrates running large language models on CPU using llama.cpp compiled with Arm baseline optimizations and accelerated using NEON SIMD and SVE (when supported and enabled).

The stack includes:

llama.cpp server with Arm NEON optimizations (SVE optional)
Quantized Qwen3.5-0.8B model bundled in the image
Simple web-based chat interface
No GPU required - pure CPU inference

Prerequisites

Arm Hardware: An Arm system (physical or virtual). Note that SVE support in llama.cpp requires an Armv8.2-A (or newer) CPU with the SVE extension.
Docker: For container orchestration with Topo
LLM Model: A GGUF format model (e.g., Llama 3.1, Mistral, etc.)

Note: HF_MODEL must point to a Hugging Face repo that contains at least one supported .gguf file. If the repo contains multiple .gguf files and HF_MODEL_FILE is unset, the build auto-selects a CPU-friendly quantization (preferring Q4_K_M). Sharded GGUFs and multimodal projector files (mmproj) are rejected with a clear error because this template only supports single-file text model GGUFs today. Not all model repos include GGUF quantizations — look for repos with -GGUF in the name. The selected model is baked into the image at /models/model.gguf.

Build-Time Parameters

Parameter	Description	Default
`HF_MODEL`	Hugging Face model repo ID containing `.gguf` files	`bartowski/Qwen_Qwen3.5-0.8B-GGUF`
`HF_MODEL_FILE`	Optional explicit GGUF filename	`""`
`ENABLE_SVE`	Enable SVE optimizations	`OFF`

Usage

The easiest way to deploy is using topo. Download and install topo from here

Clone the project:

topo clone git@github.com:Arm-Examples/topo-v9-cpu-chat.git

Build and Deploy the project:

cd topo-v9-cpu-chat
topo deploy --target <ip-address-of-target>

Common Model Selection Examples

Use a different model:

topo deploy --target <ip-address-of-target> \
  --arg HF_MODEL=unsloth/SmolLM2-135M-Instruct-GGUF

Force an exact GGUF file:

topo deploy --target <ip-address-of-target> \
  --arg HF_MODEL=bartowski/Qwen_Qwen3.5-0.8B-GGUF \
  --arg HF_MODEL_FILE=Qwen_Qwen3.5-0.8B-Q5_K_M.gguf

Access the Chat Interface

Open your browser to URL:3000 to start chatting!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
llama-inference		llama-inference
simple-chat		simple-chat
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arm CPU LLM Chat

Overview

Prerequisites

Build-Time Parameters

Usage

Clone the project:

Build and Deploy the project:

Common Model Selection Examples

Access the Chat Interface

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arm CPU LLM Chat

Overview

Prerequisites

Build-Time Parameters

Usage

Clone the project:

Build and Deploy the project:

Common Model Selection Examples

Access the Chat Interface

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages