GitHub - radlab-dev-group/llm-router-plugins: A companion repository for llm-router containing a collection of pipeline-ready plugins. Features a masking interface for anonymizing sensitive data and a guardrail system for validating input/output safety against defined policy rules.

Overview

The LLM‑Router project ships with a modular plugin system that lets you plug‑in anonymizers (also called maskers) and guardrails into request‑processing pipelines.
Each plugin implements a tiny, well‑defined interface (apply) and can be composed in an ordered list to form a * pipeline*. Pipelines are instantiated by the MaskerPipeline and GuardrailPipeline classes and are driven automatically by the endpoint logic in endpoint_i.py.

1. Anonymizers (Maskers)

1.1 What they do

Goal – Remove or replace personally‑identifiable information (PII) from a payload before it reaches the LLM or an external service.
Typical strategy – Run a pipeline of maskers that locate spans corresponding to IDs, emails, IPs, etc., and replace each span with a placeholder such as {{MASKED_ITEM}}.

1.2 Built‑in anonymizer plugins

Plugin	Description	Technical notes
FastMaskerPlugin (`fast_masker_plugin.py`)	Thin wrapper around the `FastMasker` utility class. Receives a JSON‑compatible payload and returns the same payload with all detected PII masked.	Implements `PluginInterface`. The heavy lifting is delegated to `FastMasker.mask_payload(payload)`. No extra I/O; the `FastMasker` instance is created once in `__init__`.

1.3 How a masker is used

The endpoint (e.g. EndpointI._do_masking_if_needed) checks the global flag FORCE_MASKING.
If enabled, it creates a MaskerPipeline with the list of masker plugin identifiers (e.g. ["fast_masker"]).
The pipeline calls each plugin’s apply method sequentially, feeding the output of one as the input of the next.
The final payload – now stripped of PII – proceeds to the rest of the request flow (guardrails, model dispatch, etc.).

2. Guardrails

2.1 What they do

Goal – Verify that a request (or its response) complies with policy rules (e.g. no hateful, illegal, or unsafe content).
Typical strategy – Split the payload into manageable text chunks, run a pipeline of guardrails, aggregate per‑chunk scores, and decide whether the overall request is safe.

2.2 Built‑in guardrail plugins

Plugin	Description	Technical notes
NASKGuardPlugin (`nask_guard_plugin.py`)	HTTP‑based guardrail that forwards the payload to the external NASK guardrail service (`/nask_guard` endpoint) and returns a boolean safe flag together with the raw response.	Inherits from `HttpPluginInterface`. The `apply` method calls `_request(payload)` (provided by the base class) and extracts `results["safe"]`. Errors are caught and logged; on failure the plugin returns `(False, {})`.
SojkaGuardPlugin (`sojka_guard_plugin.py`)	HTTP‑based guardrail that forwards the payload to the Sójka guardrail service (`/sojka_guard` endpoint) and returns a safety flag.	Mirrors the design of `NASKGuardPlugin`. The `endpoint_url` is built from the `LLM_ROUTER_GUARDRAIL_SOJKA_GUARD_HOST` environment variable. On success it returns `(True, response)`, otherwise `(False, {})`.
(Implicit) GuardrailProcessor (`processor.py`)	Core logic used by the internal NASK guardrail Flask route (`nask_guardrail`). Tokenises the payload, creates overlapping chunks, runs a Hugging‑Face `text‑classification` pipeline, and produces a detailed safety report.	Handles model loading (`AutoTokenizer`, `pipeline("text‑classification")`), chunking (`_chunk_text`), and scoring thresholds (`MIN_SCORE_FOR_SAFE`, `MIN_SCORE_FOR_NOT_SAFE`). Returns a dict: `{"safe": <bool>, "detailed": [...]}`.

2.3 How a guardrail is used

The endpoint calls _is_request_guardrail_safe(payload) (or the analogous response guardrail).
If FORCE_GUARDRAIL_REQUEST is true, a GuardrailPipeline is built from the configured plugin IDs (e.g. ["nask_guard", "sojka_guard"]).
The pipeline iterates over each guardrail plugin; each apply returns (is_safe, message).
The first plugin that reports is_safe=False short‑circuits the pipeline and the request is rejected with a 400/500 error payload.

3. Pipelines

Both masker and guardrail pipelines share the same design pattern:

Class	Purpose
MaskerPipeline (`pipeline.py` – masker version)	Executes a list of masker plugins in order, transforming the payload step‑by‑step.
GuardrailPipeline (`pipeline.py` – guardrail version)	Executes guardrail plugins sequentially, stopping on the first failure.

3.1 Registration

Plugins are registered lazily via MaskerRegistry.register(name, logger) or GuardrailRegistry.register(name, logger).
The registry maps a string identifier (e.g. "fast_masker") to a concrete plugin class, allowing pipelines to resolve the classes at runtime.

3.2 Configuration

All plugin identifiers are stored in environment variables or constants such as:

MASKING_STRATEGY_PIPELINE = ["fast_masker"]
GUARDRAIL_STRATEGY_PIPELINE_REQUEST = ["nask_guard", "sojka_guard"]

These lists are consumed by the endpoint initialization (EndpointI._prepare_masker_pipeline, EndpointI._prepare_guardrails_pipeline).

4. Adding a New Plugin

Create a subclass of either PluginInterface (for maskers) or HttpPluginInterface / a custom guardrail base.
Define a name class attribute – this is the identifier used in pipeline configuration.
Implement apply(self, payload: Dict) -> Dict (masker) **or apply(self, payload: Dict) -> Tuple[bool, Dict] ** (guardrail).
Register the plugin – either automatically via the registry’s register call in the pipeline constructor, or manually by calling MaskerRegistry.register(name=MyPlugin.name, logger=logger).

Example stub for a new masker:

# my_custom_masker.py
from llm_router_plugins.maskers.plugin_interface import PluginInterface
import logging
from typing import Dict, Optional


class MyCustomMasker(PluginInterface):
    name = "my_custom_masker"

    def __init__(self, logger: Optional[logging.Logger] = None):
        super().__init__(logger=logger)
        # Load any heavy resources here (e.g., a spaCy model)

    def apply(self, payload: Dict) -> Dict:
        # Perform your masking logic and return the modified payload
        return payload

After placing the file in llm_router_plugins/maskers/plugins/, enable it by adding "my_custom_masker" to MASKING_STRATEGY_PIPELINE.

5. Retrieval‑Augmented Generation (RAG) Support

The project now includes a LangChain‑based RAG plugin that enables semantic search over user‑provided documents. The implementation lives in llm_router_plugins/utils/rag/langchain_plugin.py and is driven by the helper CLI scripts located in scripts/.

5.1 What the plugin does

Feature	Description
Indexing	Reads a directory of text‑like files (`.txt`, `.md`, `.html`, `.js`, …), splits them into token‑based windows, embeds each chunk with a configurable transformer model, and stores the vectors in a FAISS (or compatible) vector store.
Searching	Given a user query, retrieves the most similar chunks and injects them into the payload (e.g., appends to the last user message) so that downstream LLM calls can use the retrieved context.
Configuration	All parameters (collection name, embedder model, device, chunk size, overlap, persistence directory) are driven by environment variables prefixed with `LLM_ROUTER_`. See the table below for the full list.
CLI helpers	Two ready‑to‑use scripts: `scripts/llm-router-rag-langchain-index.sh` (indexes a repository) and `scripts/llm-router-rag-langchain-search.sh` (runs a search or starts an interactive REPL).

5.2 Environment variables

Variable	Default	Meaning
`LLM_ROUTER_LANGCHAIN_RAG_COLLECTION`	must be set	Name of the FAISS collection (e.g. `sample_collection`).
`LLM_ROUTER_LANGCHAIN_RAG_EMBEDDER`	`/mnt/data2/llms/models/community/google/embeddinggemma-300m`	Path or Hugging‑Face identifier of the embedding model.
`LLM_ROUTER_LANGCHAIN_RAG_DEVICE`	`cuda:2`	Torch device (`cpu`, `cuda:0`, …).
`LLM_ROUTER_LANGCHAIN_RAG_CHUNK_SIZE`	`1024`	Number of tokens per chunk.
`LLM_ROUTER_LANGCHAIN_RAG_CHUNK_OVERLAP`	`100`	Number of overlapping tokens between consecutive chunks.
`LLM_ROUTER_LANGCHAIN_RAG_PERSIST_DIR`	`./workdir/plugins/utils/rag/langchain/${LLM_ROUTER_LANGCHAIN_RAG_COLLECTION}`	Directory where the FAISS index and docstore are persisted.

Example export block (add to your shell profile or a `.env` file)

export LLM_ROUTER_LANGCHAIN_RAG_COLLECTION="${LLM_ROUTER_LANGCHAIN_RAG_COLLECTION:-sample_collection}"
export LLM_ROUTER_LANGCHAIN_RAG_EMBEDDER="${LLM_ROUTER_LANGCHAIN_RAG_EMBEDDER:-/mnt/data2/llms/models/community/google/embeddinggemma-300m}"
export LLM_ROUTER_LANGCHAIN_RAG_DEVICE="${LLM_ROUTER_LANGCHAIN_RAG_DEVICE:-cuda:2}"
export LLM_ROUTER_LANGCHAIN_RAG_CHUNK_SIZE="${LLM_ROUTER_LANGCHAIN_RAG_CHUNK_SIZE:-1024}"
export LLM_ROUTER_LANGCHAIN_RAG_CHUNK_OVERLAP="${LLM_ROUTER_LANGCHAIN_RAG_CHUNK_OVERLAP:-100}"
export LLM_ROUTER_LANGCHAIN_RAG_PERSIST_DIR="${LLM_ROUTER_LANGCHAIN_RAG_PERSIST_DIR:-./workdir/plugins/utils/rag/langchain/${LLM_ROUTER_LANGCHAIN_RAG_COLLECTION}}"

5.3 Using the CLI scripts

Index a repository (example for the documentation site):

scripts/llm-router-rag-langchain-index.sh
# Internally runs:
# llm-router-rag-langchain index --path "../.github/pages/llmrouter.cloud/" --ext .html .js .md

Search (interactive REPL):

scripts/llm-router-rag-langchain-search.sh
# Internally runs:
# llm-router-rag-langchain search
# (you will be prompted for a query, type “exit” to quit)

One‑shot search:

llm-router-rag-langchain search --query "What is Retrieval‑Augmented Generation?" --top_n 5

The CLI returns the raw matching chunks together with similarity scores. The LangchainRAGPlugin automatically formats the retrieved text and appends it to the user’s last message, prefixed with:

If the context below will help answer the above question, use it.
Context separated with double enter

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
llm_router_plugins		llm_router_plugins
scripts		scripts
.gitignore		.gitignore
.version		.version
LICENSE		LICENSE
README.md		README.md
requirements-extra.txt		requirements-extra.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

1. Anonymizers (Maskers)

1.1 What they do

1.2 Built‑in anonymizer plugins

1.3 How a masker is used

2. Guardrails

2.1 What they do

2.2 Built‑in guardrail plugins

2.3 How a guardrail is used

3. Pipelines

3.1 Registration

3.2 Configuration

4. Adding a New Plugin

5. Retrieval‑Augmented Generation (RAG) Support

5.1 What the plugin does

5.2 Environment variables

Example export block (add to your shell profile or a `.env` file)

5.3 Using the CLI scripts

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

radlab-dev-group/llm-router-plugins

Folders and files

Latest commit

History

Repository files navigation

Overview

1. Anonymizers (Maskers)

1.1 What they do

1.2 Built‑in anonymizer plugins

1.3 How a masker is used

2. Guardrails

2.1 What they do

2.2 Built‑in guardrail plugins

2.3 How a guardrail is used

3. Pipelines

3.1 Registration

3.2 Configuration

4. Adding a New Plugin

5. Retrieval‑Augmented Generation (RAG) Support

5.1 What the plugin does

5.2 Environment variables

Example export block (add to your shell profile or a .env file)

5.3 Using the CLI scripts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Example export block (add to your shell profile or a `.env` file)

Packages