A minimal React + TypeScript starter app demonstrating on-device AI in the browser using the @runanywhere/web SDK. All inference runs locally via WebAssembly — no server, no API key, 100% private.
| Tab | What it does |
|---|---|
| Chat | Stream text from an on-device LLM (LFM2 350M) |
| Vision | Point your camera and describe what the VLM sees (LFM2-VL 450M) |
| Voice | Speak naturally — VAD detects speech, STT transcribes, LLM responds, TTS speaks back |
npm install
npm run devOpen http://localhost:5173. Models are downloaded on first use and cached in the browser's Origin Private File System (OPFS).
@runanywhere/web (npm package)
├── WASM engine (llama.cpp, whisper.cpp, sherpa-onnx)
├── Model management (download, OPFS cache, load/unload)
└── TypeScript API (TextGeneration, STT, TTS, VAD, VLM, VoicePipeline)
The app imports everything from @runanywhere/web:
import { RunAnywhere, SDKEnvironment } from '@runanywhere/web';
import { TextGeneration, VLMWorkerBridge } from '@runanywhere/web-llamacpp';
await RunAnywhere.initialize({ environment: SDKEnvironment.Development });
// Stream LLM text
const { stream } = await TextGeneration.generateStream('Hello!', { maxTokens: 200 });
for await (const token of stream) { console.log(token); }
// VLM: describe an image
const result = await VLMWorkerBridge.shared.process(rgbPixels, width, height, 'Describe this.');src/
├── main.tsx # React root
├── App.tsx # Tab navigation (Chat | Vision | Voice)
├── runanywhere.ts # SDK init + model catalog + VLM worker
├── workers/
│ └── vlm-worker.ts # VLM Web Worker entry (2 lines)
├── hooks/
│ └── useModelLoader.ts # Shared model download/load hook
├── components/
│ ├── ChatTab.tsx # LLM streaming chat
│ ├── VisionTab.tsx # Camera + VLM inference
│ ├── VoiceTab.tsx # Full voice pipeline
│ └── ModelBanner.tsx # Download progress UI
└── styles/
└── index.css # Dark theme CSS
Edit the MODELS array in src/runanywhere.ts:
{
id: 'my-custom-model',
name: 'My Model',
repo: 'username/repo-name', // HuggingFace repo
files: ['model.Q4_K_M.gguf'], // Files to download
framework: LLMFramework.LlamaCpp,
modality: ModelCategory.Language, // or Multimodal, SpeechRecognition, etc.
memoryRequirement: 500_000_000, // Bytes
}Any GGUF model compatible with llama.cpp works for LLM/VLM. STT/TTS/VAD use sherpa-onnx models.
npm run build
npx vercel --prodThe included vercel.json sets the required Cross-Origin-Isolation headers.
Add a _headers file:
/*
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless
Serve the dist/ folder with these HTTP headers on all responses:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless
- Chrome 96+ or Edge 96+ (recommended: 120+)
- WebAssembly (required)
- SharedArrayBuffer (requires Cross-Origin Isolation headers)
- OPFS (for persistent model cache)
MIT