Skip to content

Comments

added cloud fallback and documentation+tests#369

Open
kar-m wants to merge 3 commits intomainfrom
karen/llm-handoff
Open

added cloud fallback and documentation+tests#369
kar-m wants to merge 3 commits intomainfrom
karen/llm-handoff

Conversation

@kar-m
Copy link
Collaborator

@kar-m kar-m commented Feb 19, 2026

No description provided.

Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>
Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>
…est more robust

Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>
@kar-m kar-m marked this pull request as ready for review February 19, 2026 23:30
Copilot AI review requested due to automatic review settings February 19, 2026 23:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements cloud fallback functionality for the Cactus inference engine, allowing the system to seamlessly hand off complex queries to cloud models when local model confidence is low. The implementation includes comprehensive test coverage and updated documentation.

Changes:

  • Added new cloud integration module with async fallback support via futures
  • Refactored handoff semantics: success now always true on completion, cloud_handoff indicates cloud usage
  • Enhanced test suite with matrix-based handoff scenarios covering text-only, vision, and tool-calling cases

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cactus/ffi/cactus_cloud.h New header defining cloud API structures (CloudResponse, CloudCompletionRequest, CloudCompletionResult) and function declarations
cactus/ffi/cactus_cloud.cpp New implementation with cloud endpoint calls, base64 encoding, WAV building, and JSON parsing utilities
cactus/ffi/cactus_complete.cpp Integrated async cloud handoff with confidence-based triggering and fallback logic
cactus/ffi/cactus_stream.cpp Updated transcription to use refactored cloud functions from new module
cactus/ffi/cactus_utils.h Added new options parsing (auto_handoff, cloud_timeout_ms, handoff_with_images), removed obsolete construct_cloud_handoff_json, changed success field semantics
tests/test_utils.h Added handoff test helper function declarations
tests/test_utils.cpp Implemented json_string with proper escape handling, added build_handoff_options and run_handoff_mode_case helpers
tests/test_engine.cpp Replaced simple cloud_handoff test with comprehensive test_cloud_handoff_matrix_suite covering multiple scenarios, updated tool test to check both response sources
README.md Updated documentation clarifying success field and cloud_handoff behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +102 to +106
while (end < json.size() && depth > 0) {
if (json[end] == '[') depth++;
else if (json[end] == ']') depth--;
end++;
}
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json_array_field function doesn't account for brackets that appear inside JSON strings. If a JSON value in the response contains '[' or ']' characters (e.g., "text": "Use [this] method"), the depth counting will be incorrect and may extract a malformed JSON array. Consider tracking whether we're inside a string literal (similar to how split_top_level_json_array does at lines 121-133) to avoid counting brackets within strings.

Suggested change
while (end < json.size() && depth > 0) {
if (json[end] == '[') depth++;
else if (json[end] == ']') depth--;
end++;
}
bool in_str = false;
bool esc = false;
while (end < json.size() && depth > 0) {
char c = json[end];
if (in_str) {
if (esc) {
esc = false;
} else if (c == '\\') {
esc = true;
} else if (c == '"') {
in_str = false;
}
} else {
if (c == '"') {
in_str = true;
} else if (c == '[') {
depth++;
} else if (c == ']') {
depth--;
}
}
end++;
}
if (depth != 0) return "[]";

Copilot uses AI. Check for mistakes.
pos = json.find("\"cloud_timeout_ms\"");
if (pos != std::string::npos) {
pos = json.find(':', pos) + 1;
*cloud_timeout_ms = std::stoul(json.substr(pos));
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::stoul call could throw an exception if the JSON value is malformed (e.g., non-numeric characters, out of range). While the caller has exception handling, it would be better to add validation or use a safer parsing approach here. Consider checking if the substring starts with a digit before calling std::stoul, or wrapping this in a try-catch block to provide a more specific error message.

Suggested change
*cloud_timeout_ms = std::stoul(json.substr(pos));
// Skip any whitespace before the numeric value
size_t num_start = pos;
while (num_start < json.length() && std::isspace(static_cast<unsigned char>(json[num_start]))) {
++num_start;
}
// Extract consecutive digits as the numeric substring
size_t num_end = num_start;
while (num_end < json.length() && std::isdigit(static_cast<unsigned char>(json[num_end]))) {
++num_end;
}
if (num_start == num_end) {
throw std::invalid_argument("Invalid cloud_timeout_ms value in JSON: expected digits");
}
std::string timeout_str = json.substr(num_start, num_end - num_start);
try {
*cloud_timeout_ms = std::stoul(timeout_str);
} catch (const std::exception &e) {
throw std::invalid_argument(std::string("Invalid cloud_timeout_ms value in JSON: ") + e.what());
}

Copilot uses AI. Check for mistakes.
@@ -333,7 +339,7 @@ int cactus_complete(

if (entropy.rolling_confidence() < confidence_threshold) {
entropy.spike_handoff = true;
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spike_handoff field is set but never used after the cloud handoff refactoring. Previously this field was used to determine cloud_handoff in the response, but now the handoff_succeeded variable (line 413) is determined by cloud_used instead. Consider removing the spike_handoff field from EntropyState and this assignment to clean up dead code.

Suggested change
entropy.spike_handoff = true;

Copilot uses AI. Check for mistakes.
const std::string& fallback_text,
long timeout_seconds) {
#ifdef CACTUS_USE_CURL
std::string endpoint = "https://104.198.76.3/api/v1/transcribe";
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloud API endpoints are hardcoded with IP address "https://104.198.76.3". Consider making these configurable via environment variables (similar to CACTUS_CLOUD_TEXT_MODEL and CACTUS_CLOUD_VLM_MODEL) to allow easier testing, development with different environments, and migration to different cloud infrastructure without code changes. This would improve operational flexibility and testability.

Copilot uses AI. Check for mistakes.
std::vector<std::string> primary_function_calls = function_calls;

if (cloud_future_started) {
auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms));
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait_for timeout here uses the same duration as the cloud_timeout_ms passed to the cloud request. This means if the cloud request takes the full timeout duration to timeout internally via CURLOPT_TIMEOUT_MS, this wait_for might not give enough buffer time to handle the async task completion overhead. Consider using a slightly longer timeout here (e.g., cloud_timeout_ms + 500) to account for async task overhead and ensure graceful timeout handling.

Suggested change
auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms));
auto wait_timeout_ms = cloud_timeout_ms + 500;
auto status = cloud_future.wait_for(std::chrono::milliseconds(wait_timeout_ms));

Copilot uses AI. Check for mistakes.
oss << "\"stop_sequences\":[\"<|im_end|>\",\"<end_of_turn>\"],";
oss << "\"telemetry_enabled\":false,";
oss << "\"confidence_threshold\":1.1,";
oss << "\"cloud_timeout_ms\":2500,";
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses a very short timeout of 2500ms for cloud_timeout_ms, while the default value in the code is 15000ms (line 342 in cactus_utils.h). This short timeout in tests might lead to frequent timeout-related test failures in slower CI environments or when the cloud service is under load. Consider using a longer timeout value that better represents realistic usage, or documenting why the test intentionally uses a reduced timeout.

Suggested change
oss << "\"cloud_timeout_ms\":2500,";
// Match the default cloud timeout (15000 ms) used in cactus_utils.h to avoid flaky tests.
oss << "\"cloud_timeout_ms\":15000,";

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant