Conversation
Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>
Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>
…est more robust Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR implements cloud fallback functionality for the Cactus inference engine, allowing the system to seamlessly hand off complex queries to cloud models when local model confidence is low. The implementation includes comprehensive test coverage and updated documentation.
Changes:
- Added new cloud integration module with async fallback support via futures
- Refactored handoff semantics:
successnow always true on completion,cloud_handoffindicates cloud usage - Enhanced test suite with matrix-based handoff scenarios covering text-only, vision, and tool-calling cases
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cactus/ffi/cactus_cloud.h | New header defining cloud API structures (CloudResponse, CloudCompletionRequest, CloudCompletionResult) and function declarations |
| cactus/ffi/cactus_cloud.cpp | New implementation with cloud endpoint calls, base64 encoding, WAV building, and JSON parsing utilities |
| cactus/ffi/cactus_complete.cpp | Integrated async cloud handoff with confidence-based triggering and fallback logic |
| cactus/ffi/cactus_stream.cpp | Updated transcription to use refactored cloud functions from new module |
| cactus/ffi/cactus_utils.h | Added new options parsing (auto_handoff, cloud_timeout_ms, handoff_with_images), removed obsolete construct_cloud_handoff_json, changed success field semantics |
| tests/test_utils.h | Added handoff test helper function declarations |
| tests/test_utils.cpp | Implemented json_string with proper escape handling, added build_handoff_options and run_handoff_mode_case helpers |
| tests/test_engine.cpp | Replaced simple cloud_handoff test with comprehensive test_cloud_handoff_matrix_suite covering multiple scenarios, updated tool test to check both response sources |
| README.md | Updated documentation clarifying success field and cloud_handoff behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while (end < json.size() && depth > 0) { | ||
| if (json[end] == '[') depth++; | ||
| else if (json[end] == ']') depth--; | ||
| end++; | ||
| } |
There was a problem hiding this comment.
The json_array_field function doesn't account for brackets that appear inside JSON strings. If a JSON value in the response contains '[' or ']' characters (e.g., "text": "Use [this] method"), the depth counting will be incorrect and may extract a malformed JSON array. Consider tracking whether we're inside a string literal (similar to how split_top_level_json_array does at lines 121-133) to avoid counting brackets within strings.
| while (end < json.size() && depth > 0) { | |
| if (json[end] == '[') depth++; | |
| else if (json[end] == ']') depth--; | |
| end++; | |
| } | |
| bool in_str = false; | |
| bool esc = false; | |
| while (end < json.size() && depth > 0) { | |
| char c = json[end]; | |
| if (in_str) { | |
| if (esc) { | |
| esc = false; | |
| } else if (c == '\\') { | |
| esc = true; | |
| } else if (c == '"') { | |
| in_str = false; | |
| } | |
| } else { | |
| if (c == '"') { | |
| in_str = true; | |
| } else if (c == '[') { | |
| depth++; | |
| } else if (c == ']') { | |
| depth--; | |
| } | |
| } | |
| end++; | |
| } | |
| if (depth != 0) return "[]"; |
| pos = json.find("\"cloud_timeout_ms\""); | ||
| if (pos != std::string::npos) { | ||
| pos = json.find(':', pos) + 1; | ||
| *cloud_timeout_ms = std::stoul(json.substr(pos)); |
There was a problem hiding this comment.
The std::stoul call could throw an exception if the JSON value is malformed (e.g., non-numeric characters, out of range). While the caller has exception handling, it would be better to add validation or use a safer parsing approach here. Consider checking if the substring starts with a digit before calling std::stoul, or wrapping this in a try-catch block to provide a more specific error message.
| *cloud_timeout_ms = std::stoul(json.substr(pos)); | |
| // Skip any whitespace before the numeric value | |
| size_t num_start = pos; | |
| while (num_start < json.length() && std::isspace(static_cast<unsigned char>(json[num_start]))) { | |
| ++num_start; | |
| } | |
| // Extract consecutive digits as the numeric substring | |
| size_t num_end = num_start; | |
| while (num_end < json.length() && std::isdigit(static_cast<unsigned char>(json[num_end]))) { | |
| ++num_end; | |
| } | |
| if (num_start == num_end) { | |
| throw std::invalid_argument("Invalid cloud_timeout_ms value in JSON: expected digits"); | |
| } | |
| std::string timeout_str = json.substr(num_start, num_end - num_start); | |
| try { | |
| *cloud_timeout_ms = std::stoul(timeout_str); | |
| } catch (const std::exception &e) { | |
| throw std::invalid_argument(std::string("Invalid cloud_timeout_ms value in JSON: ") + e.what()); | |
| } |
| @@ -333,7 +339,7 @@ int cactus_complete( | |||
|
|
|||
| if (entropy.rolling_confidence() < confidence_threshold) { | |||
| entropy.spike_handoff = true; | |||
There was a problem hiding this comment.
The spike_handoff field is set but never used after the cloud handoff refactoring. Previously this field was used to determine cloud_handoff in the response, but now the handoff_succeeded variable (line 413) is determined by cloud_used instead. Consider removing the spike_handoff field from EntropyState and this assignment to clean up dead code.
| entropy.spike_handoff = true; |
| const std::string& fallback_text, | ||
| long timeout_seconds) { | ||
| #ifdef CACTUS_USE_CURL | ||
| std::string endpoint = "https://104.198.76.3/api/v1/transcribe"; |
There was a problem hiding this comment.
The cloud API endpoints are hardcoded with IP address "https://104.198.76.3". Consider making these configurable via environment variables (similar to CACTUS_CLOUD_TEXT_MODEL and CACTUS_CLOUD_VLM_MODEL) to allow easier testing, development with different environments, and migration to different cloud infrastructure without code changes. This would improve operational flexibility and testability.
| std::vector<std::string> primary_function_calls = function_calls; | ||
|
|
||
| if (cloud_future_started) { | ||
| auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms)); |
There was a problem hiding this comment.
The wait_for timeout here uses the same duration as the cloud_timeout_ms passed to the cloud request. This means if the cloud request takes the full timeout duration to timeout internally via CURLOPT_TIMEOUT_MS, this wait_for might not give enough buffer time to handle the async task completion overhead. Consider using a slightly longer timeout here (e.g., cloud_timeout_ms + 500) to account for async task overhead and ensure graceful timeout handling.
| auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms)); | |
| auto wait_timeout_ms = cloud_timeout_ms + 500; | |
| auto status = cloud_future.wait_for(std::chrono::milliseconds(wait_timeout_ms)); |
| oss << "\"stop_sequences\":[\"<|im_end|>\",\"<end_of_turn>\"],"; | ||
| oss << "\"telemetry_enabled\":false,"; | ||
| oss << "\"confidence_threshold\":1.1,"; | ||
| oss << "\"cloud_timeout_ms\":2500,"; |
There was a problem hiding this comment.
The test uses a very short timeout of 2500ms for cloud_timeout_ms, while the default value in the code is 15000ms (line 342 in cactus_utils.h). This short timeout in tests might lead to frequent timeout-related test failures in slower CI environments or when the cloud service is under load. Consider using a longer timeout value that better represents realistic usage, or documenting why the test intentionally uses a reduced timeout.
| oss << "\"cloud_timeout_ms\":2500,"; | |
| // Match the default cloud timeout (15000 ms) used in cactus_utils.h to avoid flaky tests. | |
| oss << "\"cloud_timeout_ms\":15000,"; |
No description provided.