Ultra-Fast ABI-Stable Host–Plugin Interface for Rust
Write blazing-fast plugins with ABI stability — extreme multi-thread performance
Features • Quick Start • Usage • Performance
Benchmarked on Apple M1 Pro (10-core) — Release builds
call(...): 140M~ req/sec (10 threads, fire-and-forget)
call_response_fast(...): 124M~ req/sec (10 threads, unary fast with response *thread-local optimized*)
call_response(...): 27M~ req/sec (10 threads, unary with response)
- All data structures use C ABI (
#[repr(C)]) - Version-safe plugin loading across Rust versions
- Compatible with C, C++, Zig, Go, Rust, ...
- 140M+ req/sec multi-thread throughput
- Thread-local SID generation (zero atomic operations)
- Zero-copy data transfer with
NrVec<u8> - Sub-nanosecond ABI overhead
- Fire-and-forget: ~71.7ns (fastest)
- Unary with response: ~143.2ns
- Fast path: ~95.5ns (thread-local optimized)
- Streaming: Bi-directional communication
- Thread-safe for 24/7 HTTP servers
- Safe SID wrapping (no collision)
- Entry-based routing for multiple handlers
- Panic-safe FFI boundaries
nylon-ring/
├── crates/
│ ├── nylon-ring/ # Core ABI library
│ │ ├── src/ # NrStr, NrBytes, NrKV, NrVec
│ │ └── benches/ # ABI benchmarks
│ │
│ └── nylon-ring-host/ # Host adapter
│ ├── src/ # NylonRingHost interface
│ └── benches/ # Host overhead benchmarks
│
└── examples/
├── ex-nyring-plugin/ # Example plugin
└── ex-nyring-host/ # Example host + stress test
cargo build --releasecargo run --release --bin ex-nyring-hostcargo bench # All benchmarks
cargo bench --package nylon-ring # ABI types only
cargo bench --package nylon-ring-host # Host overhead onlyuse nylon_ring_host::NylonRingHost;
let mut host = NylonRingHost::new();
// Load plugins
host.load("plugin_a", "libs/plugin_a.so")?;
host.load("plugin_b", "libs/plugin_b.so")?;
// Get a handle to a specific plugin
let plugin_a = host.plugin("plugin_a").expect("Plugin A not found");
// Reload all plugins (useful for hot-swapping)
host.reload()?;
// Unload a plugin
host.unload("plugin_b")?;use nylon_ring_host::NylonRingHost;
let mut host = NylonRingHost::new();
host.load("default", "target/release/libmy_plugin.so")?;
let plugin = host.plugin("default").expect("Plugin not found");
// Fire-and-forget - no response waiting (~71.7ns, 13.95M calls/sec)
let status = plugin.call("handler_name", b"payload").await?;// Wait for response from plugin (~143.2ns, 6.98M calls/sec)
// Wait for response from plugin (~143.2ns, 6.98M calls/sec)
let (status, response) = plugin.call_response("handler_name", b"payload").await?;
println!("Response: {}", String::from_utf8_lossy(&response));// Thread-local optimized path (~95.5ns, 10.47M calls/sec)
// Thread-local optimized path (~95.5ns, 10.47M calls/sec)
let (status, response) = plugin.call_response_fast("handler_name", b"payload").await?;use nylon_ring::NrStatus;
// Start streaming
// Start streaming
let (sid, mut rx) = plugin.call_stream("stream_handler", b"payload").await?;
// Receive frames
while let Some(frame) = rx.recv().await {
println!("Data: {}", String::from_utf8_lossy(&frame.data));
if matches!(frame.status, NrStatus::StreamEnd | NrStatus::Err) {
break;
}
}use nylon_ring::{define_plugin, NrBytes, NrHostVTable, NrStatus, NrVec};
use std::ffi::c_void;
// Global state to store host context and vtable
static mut HOST_CTX: *mut c_void = std::ptr::null_mut();
static mut HOST_VTABLE: *const NrHostVTable = std::ptr::null();
// Initialize plugin
unsafe fn init(host_ctx: *mut c_void, host_vtable: *const NrHostVTable) -> NrStatus {
HOST_CTX = host_ctx;
HOST_VTABLE = host_vtable;
NrStatus::Ok
}
// Handler example
unsafe fn handle_echo(sid: u64, payload: NrBytes) -> NrStatus {
// Echo back using zero-copy NrVec
let nr_vec = NrVec::from_slice(payload.as_slice());
let send_result = (*HOST_VTABLE).send_result;
send_result(HOST_CTX, sid, NrStatus::Ok, nr_vec);
NrStatus::Ok
}
// Plugin shutdown
fn shutdown() {
// Cleanup
}
// Define plugin with entry points
define_plugin! {
init: init,
shutdown: shutdown,
entries: {
"echo" => handle_echo,
},
}The define_plugin! macro:
- ✅ Creates panic-safe FFI wrappers
- ✅ Exports
nylon_ring_get_plugin_v1()entry point - ✅ Routes requests by entry name
- ✅ Handles panics across FFI boundaries
Measured on Apple M1 Pro (10-core) with release builds
| Operation | Time | Notes |
|---|---|---|
NrStr::new |
1.03 ns | Create string view |
NrStr::as_str |
0.33 ns | Read string |
NrBytes::from_slice |
0.54 ns | Create byte view |
NrBytes::as_slice |
0.33 ns | Read bytes |
NrKV::new |
1.99 ns | Key-value pair |
NrVec::from_vec |
22.7 ns | Vec conversion |
NrVec::into_vec |
9.38 ns | Back to Vec |
NrVec::push (100 items) |
323 ns | Push 100 values |
Key Insight: ABI overhead is negligible (sub-ns to 23ns)
| Operation | Time | Throughput | Notes |
|---|---|---|---|
| Fire-and-forget | 71.7 ns | 13.95M calls/sec | Fastest ⚡ |
| Fast path | 95.5 ns | 10.47M calls/sec | Thread-local |
| Standard unary | 143.2 ns | 6.98M calls/sec | With response |
| + 128B payload | 158.7 ns | 6.30M calls/sec | Small data |
| + 1KB payload | 193.7 ns | 5.16M calls/sec | Medium data |
| + 4KB payload | 228.5 ns | 4.38M calls/sec | Large data |
| Configuration | Throughput | Latency |
|---|---|---|
| 10 threads (fire-and-forget) | 140M+ req/sec | 70 ns |
| 10 threads (fast path) | 124.8M req/sec | 77 ns |
| 10 threads (standard) | 27.3M req/sec | 362 ns |
Key Optimization: Thread-local SID generation eliminates atomic operations entirely
The Nylon Ring architecture is designed around a strictly defined ABI boundary that separates the Host runtime from Plugin logic, connected by a high-performance routing layer.
+-----------------------------------------------------------+
| Host Layer (nylon-ring-host) |
| |
| [Public API] NylonRingHost (Container) |
| | | |
| | +---- [LoadedPlugin A] <----+ |
| | | | |
| | +---- [LoadedPlugin B] | |
| | | |
| v (1. Get SID) | |
| [ID Generator] <-----> [Shared Host Context] |
| | +---------------------------+ |
| | | [Thread-Local Slot] | |
| | | (Zero Contention) | |
| | +---------------------------+ |
| | | [Sharded DashMap] | |
| | | (64 Shards) | |
| | +---------------------------+ |
| | ^ |
| v (2. FFI Call via PluginHandle) | |
| [PluginHandle] ----------------------+ |
+-------+---------------------------------+-----------------+
| |
v | (3. send_result)
+-------+---------------------------------+-----------------+
| | ABI Boundary | |
| v | |
| [VTable Interface] [Callback Router] |
| ^ |
| | |
+-----------------------------------------+-----------------+
| |
v |
+-------+---------------------------------+-----------------+
| | Plugin Layer | |
| v | |
| [Business Logic] ---------------------+ |
| |
+-----------------------------------------------------------+
The runtime environment that manages plugin lifecycles and request routing.
- Multi-Plugin Support:
NylonRingHostacts as a container for multipleLoadedPlugininstances. Each plugin is isolated but shares the underlying host context (state map, ID generator). - Hybrid State Management:
- Fast Path (Sync): Uses
Thread-Local Storage(TLS) to store result slots. This eliminates all lock contention and atomic operations for synchronous calls. - Standard Path (Async): Uses a Sharded DashMap (64 shards) to track pending requests. Sharding minimizes lock contention in multi-threaded environments.
- Fast Path (Sync): Uses
- ID Generation: simple, thread-local counter with blocked allocation (1M per block) to avoid global atomic contention.
- Routing: The callback handler uses a Waterfall Strategy:
- Check TLS Slot (Is this a fast synchronous response on the same thread?).
- Check Sharded Map (Is this an async response from any thread?).
Defines the strictly stable interface between Host and Plugin.
- Stable Memory Layout: All exchanged types (
NrVec,NrStr,NrStatus) are#[repr(C)], guaranteeing identical memory representation across languages (Rust, C++, etc.). - Zero-Copy Protocol:
NrVec<T>allows ownership of heap-allocated memory (like aVec<u8>) to be transferred across the FFI boundary without copying.
The implementer of business logic.
- Stateless & Async-Agnostic: Plugins receive an ID and Payload. They process it (sync or async) and call
send_resultwhen finished. The Host handles the complexity of mapping that result back to the original caller.
NrStr— String view (&strequivalent)NrBytes— Byte slice view (&[u8]equivalent)NrKV— Key-value pairNrVec<T>— Owned vector with zero-copy transferNrStatus— Result status enumNrHostVTable— Host callbacksNrPluginVTable— Plugin entry points
NylonRingHost— Main host interfaceStreamFrame— Streaming data frameStreamReceiver— Stream receiver channel
- High-throughput HTTP servers (REST, GraphQL)
- WebSocket backends
- RPC services
- Plugin systems requiring isolation
- Hot-reloadable business logic
- Cross-language plugins (use direct FFI)
- Very low latency requirements (<10ns)
- Single-threaded only workloads
- Tool: Criterion.rs with statistical analysis
- Iterations: 100 samples, outlier detection
- Warmup: Automatic warmup period
- Output: HTML reports in
target/criterion/
- Method: Full round-trip (host → plugin → callback)
- Plugin: Example plugin with minimal work
- Runtime: Tokio async runtime
- Builds: Release builds only
- Method: 10 threads, 100 req/batch, 10-second run
- Pattern: Fire-and-forget (no response wait)
- Total: 1.40B~ requests in 10 seconds
| Principle | Implementation |
|---|---|
| ABI Stability | All types are #[repr(C)] |
| Zero Atomic Ops | Thread-local SID generation |
| Zero Copy | NrVec<u8> ownership transfer |
| Panic Safety | FFI boundaries catch panics |
| Thread Safety | Safe for multi-threaded hosts |
| Fast Path | Specialized optimizations available |
MIT License
Inspired by high-performance plugin systems and FFI best practices.
Built with:
- Tokio — Async runtime
- DashMap — Concurrent hashmap
- FxHash — Fast hashing
- Criterion — Benchmarking
- libloading — Dynamic library loading