added cloud fallback and documentation+tests by kar-m · Pull Request #369 · cactus-compute/cactus

kar-m · 2026-02-19T05:28:59Z

No description provided.

Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>

…est more robust Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>

Copilot

Pull request overview

This PR implements cloud fallback functionality for the Cactus inference engine, allowing the system to seamlessly hand off complex queries to cloud models when local model confidence is low. The implementation includes comprehensive test coverage and updated documentation.

Changes:

Added new cloud integration module with async fallback support via futures
Refactored handoff semantics: success now always true on completion, cloud_handoff indicates cloud usage
Enhanced test suite with matrix-based handoff scenarios covering text-only, vision, and tool-calling cases

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
cactus/ffi/cactus_cloud.h	New header defining cloud API structures (CloudResponse, CloudCompletionRequest, CloudCompletionResult) and function declarations
cactus/ffi/cactus_cloud.cpp	New implementation with cloud endpoint calls, base64 encoding, WAV building, and JSON parsing utilities
cactus/ffi/cactus_complete.cpp	Integrated async cloud handoff with confidence-based triggering and fallback logic
cactus/ffi/cactus_stream.cpp	Updated transcription to use refactored cloud functions from new module
cactus/ffi/cactus_utils.h	Added new options parsing (auto_handoff, cloud_timeout_ms, handoff_with_images), removed obsolete construct_cloud_handoff_json, changed success field semantics
tests/test_utils.h	Added handoff test helper function declarations
tests/test_utils.cpp	Implemented json_string with proper escape handling, added build_handoff_options and run_handoff_mode_case helpers
tests/test_engine.cpp	Replaced simple cloud_handoff test with comprehensive test_cloud_handoff_matrix_suite covering multiple scenarios, updated tool test to check both response sources
README.md	Updated documentation clarifying success field and cloud_handoff behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T23:36:00Z

cactus/ffi/cactus_cloud.cpp

+    while (end < json.size() && depth > 0) {
+        if (json[end] == '[') depth++;
+        else if (json[end] == ']') depth--;
+        end++;
+    }


The json_array_field function doesn't account for brackets that appear inside JSON strings. If a JSON value in the response contains '[' or ']' characters (e.g., "text": "Use [this] method"), the depth counting will be incorrect and may extract a malformed JSON array. Consider tracking whether we're inside a string literal (similar to how split_top_level_json_array does at lines 121-133) to avoid counting brackets within strings.

Suggested change

while (end < json.size() && depth > 0) {

if (json[end] == '[') depth++;

else if (json[end] == ']') depth--;

end++;

}

bool in_str = false;

bool esc = false;

while (end < json.size() && depth > 0) {

char c = json[end];

if (in_str) {

if (esc) {

esc = false;

} else if (c == '\\') {

esc = true;

} else if (c == '"') {

in_str = false;

}

} else {

if (c == '"') {

in_str = true;

} else if (c == '[') {

depth++;

} else if (c == ']') {

depth--;

}

}

end++;

}

if (depth != 0) return "[]";

Copilot · 2026-02-19T23:36:01Z

cactus/ffi/cactus_utils.h

+        pos = json.find("\"cloud_timeout_ms\"");
+        if (pos != std::string::npos) {
+            pos = json.find(':', pos) + 1;
+            *cloud_timeout_ms = std::stoul(json.substr(pos));


The std::stoul call could throw an exception if the JSON value is malformed (e.g., non-numeric characters, out of range). While the caller has exception handling, it would be better to add validation or use a safer parsing approach here. Consider checking if the substring starts with a digit before calling std::stoul, or wrapping this in a try-catch block to provide a more specific error message.

Suggested change

*cloud_timeout_ms = std::stoul(json.substr(pos));

// Skip any whitespace before the numeric value

size_t num_start = pos;

while (num_start < json.length() && std::isspace(static_cast<unsigned char>(json[num_start]))) {

++num_start;

}

// Extract consecutive digits as the numeric substring

size_t num_end = num_start;

while (num_end < json.length() && std::isdigit(static_cast<unsigned char>(json[num_end]))) {

++num_end;

}

if (num_start == num_end) {

throw std::invalid_argument("Invalid cloud_timeout_ms value in JSON: expected digits");

}

std::string timeout_str = json.substr(num_start, num_end - num_start);

try {

*cloud_timeout_ms = std::stoul(timeout_str);

} catch (const std::exception &e) {

throw std::invalid_argument(std::string("Invalid cloud_timeout_ms value in JSON: ") + e.what());

}

Copilot · 2026-02-19T23:36:02Z

cactus/ffi/cactus_complete.cpp

@@ -333,7 +339,7 @@ int cactus_complete(

                if (entropy.rolling_confidence() < confidence_threshold) {
                    entropy.spike_handoff = true;


The spike_handoff field is set but never used after the cloud handoff refactoring. Previously this field was used to determine cloud_handoff in the response, but now the handoff_succeeded variable (line 413) is determined by cloud_used instead. Consider removing the spike_handoff field from EntropyState and this assignment to clean up dead code.

Suggested change

entropy.spike_handoff = true;

Copilot · 2026-02-19T23:36:02Z

cactus/ffi/cactus_cloud.cpp

+                                       const std::string& fallback_text,
+                                       long timeout_seconds) {
+#ifdef CACTUS_USE_CURL
+    std::string endpoint = "https://104.198.76.3/api/v1/transcribe";


The cloud API endpoints are hardcoded with IP address "https://104.198.76.3". Consider making these configurable via environment variables (similar to CACTUS_CLOUD_TEXT_MODEL and CACTUS_CLOUD_VLM_MODEL) to allow easier testing, development with different environments, and migration to different cloud infrastructure without code changes. This would improve operational flexibility and testability.

Copilot · 2026-02-19T23:36:02Z

cactus/ffi/cactus_complete.cpp

+        std::vector<std::string> primary_function_calls = function_calls;
+
+        if (cloud_future_started) {
+            auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms));


The wait_for timeout here uses the same duration as the cloud_timeout_ms passed to the cloud request. This means if the cloud request takes the full timeout duration to timeout internally via CURLOPT_TIMEOUT_MS, this wait_for might not give enough buffer time to handle the async task completion overhead. Consider using a slightly longer timeout here (e.g., cloud_timeout_ms + 500) to account for async task overhead and ensure graceful timeout handling.

Suggested change

auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms));

auto wait_timeout_ms = cloud_timeout_ms + 500;

auto status = cloud_future.wait_for(std::chrono::milliseconds(wait_timeout_ms));

Copilot · 2026-02-19T23:36:02Z

tests/test_utils.cpp

+    oss << "\"stop_sequences\":[\"<|im_end|>\",\"<end_of_turn>\"],";
+    oss << "\"telemetry_enabled\":false,";
+    oss << "\"confidence_threshold\":1.1,";
+    oss << "\"cloud_timeout_ms\":2500,";


The test uses a very short timeout of 2500ms for cloud_timeout_ms, while the default value in the code is 15000ms (line 342 in cactus_utils.h). This short timeout in tests might lead to frequent timeout-related test failures in slower CI environments or when the cloud service is under load. Consider using a longer timeout value that better represents realistic usage, or documenting why the test intentionally uses a reduced timeout.

Suggested change

oss << "\"cloud_timeout_ms\":2500,";

// Match the default cloud timeout (15000 ms) used in cactus_utils.h to avoid flaky tests.

oss << "\"cloud_timeout_ms\":15000,";

kar-m added 3 commits February 18, 2026 21:25

added cloud fallback and documentation+tests

beb6076

Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>

rolled back confidence threshold

799a689

Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>

reverted response json changes, added new semantics, made tool call t…

e1c9faa

…est more robust Signed-off-by: Karen Mosoyan <karen.mossoyan@gmail.com>

kar-m marked this pull request as ready for review February 19, 2026 23:30

Copilot AI review requested due to automatic review settings February 19, 2026 23:30

Copilot started reviewing on behalf of kar-m February 19, 2026 23:30 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

added cloud fallback and documentation+tests#369

added cloud fallback and documentation+tests#369
kar-m wants to merge 3 commits intomainfrom
karen/llm-handoff

kar-m commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    while (end < json.size() && depth > 0) {
-        if (json[end] == '[') depth++;
-        else if (json[end] == ']') depth--;
-        end++;
-    }
+    bool in_str = false;
+    bool esc = false;
+    while (end < json.size() && depth > 0) {
+        char c = json[end];
+        if (in_str) {
+            if (esc) {
+                esc = false;
+            } else if (c == '\\') {
+                esc = true;
+            } else if (c == '"') {
+                in_str = false;
+            }
+        } else {
+            if (c == '"') {
+                in_str = true;
+            } else if (c == '[') {
+                depth++;
+            } else if (c == ']') {
+                depth--;
+            }
+        }
+        end++;
+    }
+    if (depth != 0) return "[]";

-            *cloud_timeout_ms = std::stoul(json.substr(pos));
+            // Skip any whitespace before the numeric value
+            size_t num_start = pos;
+            while (num_start < json.length() && std::isspace(static_cast<unsigned char>(json[num_start]))) {
+                ++num_start;
+            }
+            // Extract consecutive digits as the numeric substring
+            size_t num_end = num_start;
+            while (num_end < json.length() && std::isdigit(static_cast<unsigned char>(json[num_end]))) {
+                ++num_end;
+            }
+            if (num_start == num_end) {
+                throw std::invalid_argument("Invalid cloud_timeout_ms value in JSON: expected digits");
+            }
+            std::string timeout_str = json.substr(num_start, num_end - num_start);
+            try {
+                *cloud_timeout_ms = std::stoul(timeout_str);
+            } catch (const std::exception &e) {
+                throw std::invalid_argument(std::string("Invalid cloud_timeout_ms value in JSON: ") + e.what());
+            }

		@@ -333,7 +339,7 @@ int cactus_complete(

		if (entropy.rolling_confidence() < confidence_threshold) {
		entropy.spike_handoff = true;

	auto status = cloud_future.wait_for(std::chrono::milliseconds(cloud_timeout_ms));
	auto wait_timeout_ms = cloud_timeout_ms + 500;
	auto status = cloud_future.wait_for(std::chrono::milliseconds(wait_timeout_ms));

	oss << "\"cloud_timeout_ms\":2500,";
	// Match the default cloud timeout (15000 ms) used in cactus_utils.h to avoid flaky tests.
	oss << "\"cloud_timeout_ms\":15000,";

Comments

Conversation

kar-m commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant