🧠 NeuroServe – GPU/CPU FastAPI AI Server

A lightweight REST API server powered by FastAPI & PyTorch.
It runs seamlessly on GPU (CUDA) if available, and safely falls back to CPU.

🚀 Includes a demo model, performance probes, a web control panel, and an extendable Plugin System for real AI models.

✨ Features

✅ Auto-selects GPU or CPU at runtime (DEVICE in .env).
✅ Clean FastAPI endpoints with Swagger UI (/docs) and ReDoc (/redoc).
✅ Interactive Control Panel (/ui) & Model Size Calculator (/tools/model-size).
✅ Example TinyNet model + benchmarks (/matmul, /infer).
✅ Extensible Plugin System with ready-to-use AI models.
✅ Works on Windows/Linux/macOS (CPU) & CUDA (NVIDIA GPUs).

📂 Project Structure

app/
  main.py           # FastAPI app & endpoints
  runtime.py        # device selection, CUDA info, warmup
  toy_model.py      # TinyNet demo model
  plugins/          # Modular plugins (bart, clip, resnet18, ner, etc.)
  templates/        # HTML templates for UI
scripts/
  install_torch.py  # Auto-installs correct PyTorch (CPU/GPU)
  test_api.py       # Quick client to test endpoints
requirements.txt
README.md

⚡ Quickstart

1. Setup Environment

Windows (PowerShell):

py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1

Linux/macOS:

python3.12 -m venv .venv
source .venv/bin/activate

2. Install Dependencies

pip install -r requirements.txt
python -m scripts.install_torch

3. Run the Server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Open in browser:

Swagger UI → http://localhost:8000/docs
Control Panel → http://localhost:8000/ui
Plugin Console → http://localhost:8000/plugins/ui
Infer Client → http://localhost:8000/infer-client

⚙️ Configuration (`.env`)

# Prefer the first CUDA GPU if available
DEVICE=cuda:0

# Force CPU mode (override)
# DEVICE=cpu

# Warmup matrix size (for benchmarks)
WARMUP_MATMUL_SIZE=1024

🔌 API Endpoints

Method	Path	Description
GET	`/health`	Health check
GET	`/cuda`	CUDA / device info
GET	`/env`	Environment summary
GET	`/env/full`	Full environment + GPU list
GET	`/env/system`	OS/CPU/RAM system info
POST	`/matmul`	Matrix multiply benchmark
POST	`/infer`	TinyNet inference
POST	`/inference`	Generic plugin inference
GET	`/plugins`	List available plugins
GET	`/ui`	Interactive control panel
GET	`/tools/model-size`	Model size calculator (UI)

🧩 Plugins

NeuroServe uses a modular plugin system under app/plugins/.

Available plugins:

bart → Text summarization (facebook/bart-large-cnn)
clip → Text ↔ Image embeddings & similarity
distilbert → Sentiment classification
resnet18 → Image classification (ImageNet)
ner → Named Entity Recognition
tinyllama → Lightweight text generation
translator / translator_m2m → Translation
pdf_reader → Extract text from PDFs
dummy → Ping test
dichfoto_proxy → Forwarding proxy

🔧 Build your own Plugin (5 min)

from app.plugins.base import AIPlugin

class Plugin(AIPlugin):
    tasks = ["hello"]

    def load(self):
        print("[plugin] hello ready")

    def infer(self, payload):
        name = payload.get("name", "world")
        return {"task": "hello", "message": f"Hello, {name}!"}

Add it in app/plugins/hello/ with manifest.json.

🖥️ Example Requests

Matrix Multiply

curl -X POST http://localhost:8000/matmul   -H "Content-Type: application/json"   -d '{"n": 2048}'

BART Summarization

curl -X POST http://localhost:8000/inference   -H "Content-Type: application/json"   -d '{"provider":"bart","task":"summarize","text":"Deep learning is a subfield..."}'

NER Entity Extraction

curl -X POST http://localhost:8000/inference   -H "Content-Type: application/json"   -d '{"provider":"ner","task":"extract-entities","text":"Barack Obama was born in Hawaii."}'

🛠️ Troubleshooting

Torch import error → Ensure Python 3.12 + rerun python -m scripts.install_torch.
No GPU detected → Falls back to CPU. Force with DEVICE=cpu.
CUDA mismatch → Reinstall torch with matching CUDA runtime.
Out of Memory (OOM) → Reduce max_length or use CPU mode.

Windows Execution Policy →

Set-ExecutionPolicy -Scope CurrentUser RemoteSigned

📍 Roadmap

File upload endpoints (images/text).
Docker images (CPU & GPU).
Extended plugin demo UI.
CI/CD with pre-commit hooks & GitHub Actions.
Real AI plugins (BART, CLIP, ResNet, DistilBERT, NER, etc.).

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
app		app
docs		docs
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 NeuroServe – GPU/CPU FastAPI AI Server

✨ Features

📂 Project Structure

⚡ Quickstart

1. Setup Environment

2. Install Dependencies

3. Run the Server

⚙️ Configuration (`.env`)

🔌 API Endpoints

🧩 Plugins

🖥️ Example Requests

🛠️ Troubleshooting

📍 Roadmap

📜 License

About

Uh oh!

Releases

Packages

Languages

License

TamerOnLine/NeuroServe

Folders and files

Latest commit

History

Repository files navigation

🧠 NeuroServe – GPU/CPU FastAPI AI Server

✨ Features

📂 Project Structure

⚡ Quickstart

1. Setup Environment

2. Install Dependencies

3. Run the Server

⚙️ Configuration (.env)

🔌 API Endpoints

🧩 Plugins

🖥️ Example Requests

🛠️ Troubleshooting

📍 Roadmap

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

⚙️ Configuration (`.env`)

Packages