React Native chat app that runs local LLMs on-device and can switch to Routstr cloud models for remote inference. The UI is optimized for a chat-first workflow with a side drawer for model selection.
- Local models (on-device): GGUF models loaded via
llama.rnwith streaming output and a Stop button. - Routstr cloud models: Call
https://api.routstr.com/v1/chat/completionswith standard Chat Completions and SSE streaming. - Single chat UI: Toggle the side menu to pick models; the header shows the selected model.
- Download manager: Built-in model downloader for local models.
- Persistent params: Context/completion params loaded from storage.
- Chat screen and drawer:
src/screens/SimpleChatScreen.tsx - Model cards (download/init):
src/components/ModelDownloadCard.tsx - LLM providers (abstraction):
src/services/llm/LLMProvider.ts– shared interfacesrc/services/llm/LocalLLMProvider.ts– local inference viallama.rnsrc/services/llm/RoutstrProvider.ts– Routstr remote via SSE
- Model constants:
src/utils/constants.ts
We use an LLMProvider interface with initialize, sendChat, stop, and release methods. The chat screen holds a single provider instance (llm) and delegates message generation to it.
- Local provider streams via
llama.rncallbacks. - Routstr provider streams via XHR Server-Sent Events, parsing
data:lines and accumulating deltas without duplication. - The Stop button is shown only for the local provider and calls
stop().
npm installnpm run pods
npm run ios
# To target a device
npm run ios -- --device "<device name>"
# Release
npm run ios -- --mode Releasenpm run android
# Release
npm run android -- --mode releaseEdit src/screens/SimpleChatScreen.tsx and set:
ROUTSTR_API_KEY: your API key (Bearer token)ROUTSTR_CHAT_MODEL: e.g.qwen/qwen3-maxROUTSTR_MODEL_NAME: UI label shown in the drawer header
- Launch the app; open the drawer (hamburger) in the chat header.
- Pick a default local model. It will download if missing, then initialize and stream.
- Or choose the Routstr model; no download necessary. Messages stream from the Routstr API.
- Switch models anytime; we prevent duplicate welcome messages and keep the chat input disabled until ready.
- Local streaming is handled by
llama.rn; remote streaming uses XHR SSE parsing for React Native compatibility. - If you change providers frequently, we release the previous provider/context before initializing the next.