Skip to content

Conversation

@ddurieux
Copy link
Contributor

It's very big

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements significant performance optimizations and architectural improvements to the GORC (Game Object Replication Channels) system, focusing on reducing lock contention, improving distance calculations, and fixing runtime lifecycle management issues.

Key Changes:

  • Migrated from tokio::sync::RwLock to DashMap for lock-free concurrent access to GORC objects and positions
  • Introduced distance_squared() method to eliminate expensive sqrt() operations in distance comparisons
  • Fixed plugin runtime lifecycle by storing runtime instances instead of just handles, preventing premature task termination
  • Replaced broadcast channels with per-connection mpsc channels for O(1) message delivery in the connection manager

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
crates/plugin_system/src/manager.rs Stores luminal runtime and tokio handle in server context to prevent async task orphaning across DLL boundaries
crates/horizon_event_system/src/types.rs Adds distance_squared() optimization method for faster distance comparisons without sqrt()
crates/horizon_event_system/src/system/handlers.rs Improves runtime handling with fallback to create runtime when none exists
crates/horizon_event_system/src/system/emitters.rs Adds GorcEvent wrapping, subscriber diagnostics, and new player subscription method
crates/horizon_event_system/src/gorc/instance.rs Major refactor replacing RwLock with DashMap for lock-free access and uses distance_squared throughout
crates/horizon_event_system/src/context.rs Adds tokio_handle() trait method for cross-DLL async task spawning
crates/horizon_event_system/benches/distance_benchmark.rs New benchmark demonstrating distance_squared performance improvements
crates/horizon_event_system/Cargo.toml Adds async-lock and criterion dependencies for optimization and benchmarking
crates/game_server/src/server/handlers.rs Migrates to per-connection message channels for O(1) delivery
crates/game_server/src/server/core.rs Adds update_player_id event handler for player authentication flow
crates/game_server/src/connection/manager.rs Replaces broadcast channel with per-connection mpsc channels for efficient message routing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +485 to +488
let mut stored_context = self.server_context.write().expect("Failed to lock server_context for write");
*stored_context = Some(context.clone());
info!("🔧 Server context stored - luminal runtime will persist for plugin tasks");
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using expect() on a RwLock can cause panics if the lock is poisoned. While lock poisoning is rare, in a plugin system where third-party code runs, it's safer to handle this error gracefully with a Result return or a logged warning rather than panicking, which could crash the entire server.

Suggested change
let mut stored_context = self.server_context.write().expect("Failed to lock server_context for write");
*stored_context = Some(context.clone());
info!("🔧 Server context stored - luminal runtime will persist for plugin tasks");
}
match self.server_context.write() {
Ok(mut stored_context) => {
*stored_context = Some(context.clone());
info!("🔧 Server context stored - luminal runtime will persist for plugin tasks");
}
Err(poisoned) => {
error!("❌ Failed to lock server_context for write: poisoned lock");
return Err(PluginSystemError::LockPoisoned("server_context".to_string()));
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +521 to +524
match tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
{
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new single-threaded runtime in the fallback path (when no runtime context exists) is inefficient. This can happen frequently if handlers are called from non-async contexts. Consider maintaining a global runtime pool or requiring callers to provide a runtime handle, as repeatedly creating runtimes has significant overhead.

Copilot uses AI. Check for mistakes.
Comment on lines +242 to 248
for (connection_id, sender) in senders.iter() {
if let Err(e) = sender.send(message.clone()) {
tracing::error!("Failed to broadcast message to connection {}: {:?}", connection_id, e);
} else {
sent_count += 1;
}
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broadcast_to_all method holds a read lock on message_senders while sending to all channels. If any channel's buffer is full or slow to process, this will block other operations from acquiring the lock. Consider collecting the senders first, then releasing the lock before sending, or clone the senders to avoid holding the lock during I/O operations.

Copilot uses AI. Check for mistakes.
/// # Returns
///
/// An unbounded receiver for messages targeted to this connection.
pub async fn register_message_channel(&self, connection_id: ConnectionId) -> mpsc::UnboundedReceiver<Vec<u8>> {
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an unbounded channel can lead to memory exhaustion if messages are produced faster than they can be consumed. Consider using a bounded channel with an appropriate capacity and handling the backpressure scenario (e.g., dropping old messages or disconnecting slow clients).

Copilot uses AI. Check for mistakes.
}

/// Get the tracked position of an object (lock-free, single source of truth for spatial queries)
pub fn get_object_position(&self, object_id: GorcObjectId) -> Option<Vec3> {
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature changed from async to sync but still internally accesses a DashMap which provides interior mutability. While this is technically safe, making this method synchronous when all other similar methods are async creates an inconsistency in the API. Consider keeping this async for API consistency, especially since callers may already be in async contexts.

Suggested change
pub fn get_object_position(&self, object_id: GorcObjectId) -> Option<Vec3> {
pub async fn get_object_position(&self, object_id: GorcObjectId) -> Option<Vec3> {

Copilot uses AI. Check for mistakes.
Comment on lines +567 to +585
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.expect("Failed to build runtime for update_player_id");

rt.block_on(async move {
// Get the connection_id for this player
if let Some(connection_id) = conn_mgr.get_connection_id_by_player(old_player_id).await {
// Update the player_id stored in the connection
conn_mgr.set_player_id(connection_id, new_player_id).await;
info!(
"🔄 Updated player_id in connection {} from {} to {}",
connection_id, old_player_id, new_player_id
);
} else {
warn!("⚠️ Failed to find connection for player {} when updating player_id", old_player_id);
}
});
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new runtime for each update_player_id event is inefficient and can lead to resource exhaustion under load. Each runtime spawns OS threads, and creating them repeatedly is expensive. Consider using the existing tokio runtime handle from the ServerContext instead, or spawn the task on the existing runtime using tokio::spawn.

Suggested change
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.expect("Failed to build runtime for update_player_id");
rt.block_on(async move {
// Get the connection_id for this player
if let Some(connection_id) = conn_mgr.get_connection_id_by_player(old_player_id).await {
// Update the player_id stored in the connection
conn_mgr.set_player_id(connection_id, new_player_id).await;
info!(
"🔄 Updated player_id in connection {} from {} to {}",
connection_id, old_player_id, new_player_id
);
} else {
warn!("⚠️ Failed to find connection for player {} when updating player_id", old_player_id);
}
});
tokio::spawn(async move {
// Get the connection_id for this player
if let Some(connection_id) = conn_mgr.get_connection_id_by_player(old_player_id).await {
// Update the player_id stored in the connection
conn_mgr.set_player_id(connection_id, new_player_id).await;
info!(
"🔄 Updated player_id in connection {} from {} to {}",
connection_id, old_player_id, new_player_id
);
} else {
warn!("⚠️ Failed to find connection for player {} when updating player_id", old_player_id);
}

Copilot uses AI. Check for mistakes.
debug!("🎮 GORC: Zone changes for player {} - {} entries, {} exits", player_id, zone_entries.len(), zone_exits.len());

// If this is a new player or they moved significantly, recalculate subscriptions
const MOVEMENT_THRESHOLD_SQ: f64 = 25.0; // 5.0 * 5.0
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 25.0 representing the squared movement threshold should be defined as a named constant (e.g., MOVEMENT_THRESHOLD_SQ) at the module level or with the original threshold for clarity and maintainability. This makes the relationship to the 5.0 meter threshold explicit.

Copilot uses AI. Check for mistakes.
@tristanpoland tristanpoland added enhancement New feature or request performance This item is related to performance (Combine with tags like Bug, Enhancement, etc) rust Pull requests that update Rust code Code Cleanup labels Dec 11, 2025
tristanpoland and others added 3 commits December 11, 2025 11:17
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@a-catgirl-dev
Copy link
Member

@ddurieux maybe you want to create an Arc<Vec<u8>> here so you don't need to actually copy the buffer. i don't know enough of the codebase to know whether the buffer will be big or not.

maybe this would lessen the impact of #301 (comment) to not require a memory copy (if there were any)

image

break;
}
while let Some(message) = message_receiver.recv().await {
let message_text = String::from_utf8_lossy(&message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_utf8_lossy is not O(1) due to the fact that it needs to loop over message.

luminal_handle: luminal::Handle,
/// The luminal runtime - MUST be stored to keep tasks alive
/// Dropping the runtime will terminate all spawned tasks!
luminal_runtime: Arc<luminal::Runtime>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring here says that dropping luminal_runtime would stop all threads, but 191-194, 469-472 say it will be orphaned

Copy link
Member

@a-catgirl-dev a-catgirl-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some copilot messages, some of mine, but the most damning is the one in crates/game_server/src/server/core.rs. i dont think that code would run very often, but it is scary as heck and if i were the one making the calls, i would not be merging this into main

also i clicked on add comments instead of start review... so you only have one comment here

Comment on lines +564 to +585
if let (Ok(old_player_id), Ok(new_player_id)) = (old_player_id, new_player_id) {
// Spawn a dedicated thread with its own runtime to handle the async work
// This is necessary because on_core_async handlers don't have a guaranteed tokio runtime context
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.expect("Failed to build runtime for update_player_id");

rt.block_on(async move {
// Get the connection_id for this player
if let Some(connection_id) = conn_mgr.get_connection_id_by_player(old_player_id).await {
// Update the player_id stored in the connection
conn_mgr.set_player_id(connection_id, new_player_id).await;
info!(
"🔄 Updated player_id in connection {} from {} to {}",
connection_id, old_player_id, new_player_id
);
} else {
warn!("⚠️ Failed to find connection for player {} when updating player_id", old_player_id);
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copilot is right. please dont do this in prod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/crates/game_server/src/server/core.rs b/crates/game_server/src/server/core.rs
index 0bf4ac1..9cc791a 100644
--- a/crates/game_server/src/server/core.rs
+++ b/crates/game_server/src/server/core.rs
@@ -564,25 +564,18 @@ impl GameServer {
                 if let (Ok(old_player_id), Ok(new_player_id)) = (old_player_id, new_player_id) {
                     // Spawn a dedicated thread with its own runtime to handle the async work
                     // This is necessary because on_core_async handlers don't have a guaranteed tokio runtime context
-                    std::thread::spawn(move || {
-                        let rt = tokio::runtime::Builder::new_current_thread()
-                            .enable_all()
-                            .build()
-                            .expect("Failed to build runtime for update_player_id");
-
-                        rt.block_on(async move {
-                            // Get the connection_id for this player
-                            if let Some(connection_id) = conn_mgr.get_connection_id_by_player(old_player_id).await {
-                                // Update the player_id stored in the connection
-                                conn_mgr.set_player_id(connection_id, new_player_id).await;
-                                info!(
-                                    "🔄 Updated player_id in connection {} from {} to {}",
-                                    connection_id, old_player_id, new_player_id
-                                );
-                            } else {
-                                warn!("⚠ Failed to find connection for player {} when updating player_id", old_player_id);
-                            }
-                        });
+                    tokio::runtime::Handle::current().spawn(async move {^M
+                        // Get the connection_id for this player^M
+                        if let Some(connection_id) = conn_mgr.get_connection_id_by_player(old_player_id).await {^M
+                            // Update the player_id stored in the connection^M
+                            conn_mgr.set_player_id(connection_id, new_player_id).await;^M
+                            info!(^M
+                                "🔄 Updated player_id in connection {} from {} to {}",^M
+                                connection_id, old_player_id, new_player_id^M
+                            );^M
+                        } else {^M
+                            warn!("⚠ Failed to find connection for player {} when updating player_id", old_player_id);^M
+                        }^M
                     });
                 } else {
                     warn!("⚠ Failed to deserialize player IDs from update_player_id event: {:?}", event);
@@ -772,4 +765,4 @@ impl GameServer {
         self.plugin_manager.clone()
     }

-}
\ No newline at end of file
+}^M

perhaps this diff might work? im not sure if this will panic because i have no idea if this closure runs in an OS thread or green thread, and there are no tests to ensure behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Code Cleanup enhancement New feature or request performance This item is related to performance (Combine with tags like Bug, Enhancement, etc) rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants