Skip to content

Conversation

@hoffmang9
Copy link
Member

@hoffmang9 hoffmang9 commented Jan 31, 2026

  • Add ARM C++ fallback for gcd_unsigned via asm_arm_fallback_impl.inc
  • Dispatch to ARM fallback in threading.h when ARCH_ARM; declare in asm_main.h
  • Define ARCH_ARM for aarch64 in parameters.h
  • Guard x86-only code: get_time_cycles (rdtsc) and x86intrin in vdf.h
  • Fix mpz_cmp_si callers in threading.h to use _() for GMP pointer
  • Makefile: detect arm64/aarch64, use C++ fallback instead of x86 asm; add Homebrew paths on macOS
  • Skip TEST_ASM verification blocks on ARM in gcd_unsigned.h, gcd_128.h, gcd_base_continued_fractions.h
  • Include ARM fallback .inc from vdf_client, prover_test, 1weso_test, 2weso_test, vdf_bench
  • README: add Running tests section (C++ and Python, ARM note)

Note

Medium Risk
Touches the fast-path GCD implementation and platform-specific dispatch/build logic, which can affect correctness/performance on ARM and potentially alter behavior if ARCH_* detection or linking is wrong.

Overview
Adds an ARM (aarch64/arm64) C++ fallback for the optimized gcd_unsigned path by implementing asm_arm_func_gcd_unsigned in asm_arm_fallback_impl.inc and dispatching to it from threading.h when ARCH_ARM is set.

Updates build and portability: Makefile.vdf-client now detects ARM vs x86 (dropping x86 asm objects on ARM and adding macOS Homebrew include/lib hints), ARCH_ARM is defined for __aarch64__, and x86-only includes/rdtsc timing are guarded; ASM self-test blocks are skipped on ARM and binaries (vdf_client, tests, vdf_bench) include the ARM fallback implementation.

Written by Cursor Bugbot for commit eae35d1. This will update automatically on new commits. Configure here.

- Add ARM C++ fallback for gcd_unsigned via asm_arm_fallback_impl.inc
- Dispatch to ARM fallback in threading.h when ARCH_ARM; declare in asm_main.h
- Define ARCH_ARM for __aarch64__ in parameters.h
- Guard x86-only code: get_time_cycles (rdtsc) and x86intrin in vdf.h
- Fix mpz_cmp_si callers in threading.h to use _() for GMP pointer
- Makefile: detect arm64/aarch64, use C++ fallback instead of x86 asm; add Homebrew paths on macOS
- Skip TEST_ASM verification blocks on ARM in gcd_unsigned.h, gcd_128.h, gcd_base_continued_fractions.h
- Include ARM fallback .inc from vdf_client, prover_test, 1weso_test, 2weso_test, vdf_bench
- README: add Running tests section (C++ and Python, ARM note)
- gcd_unsigned_arm_uv_callback: make thread_local in gcd_unsigned.h and
  asm_arm_fallback_impl.inc so concurrent threads do not overwrite it.
- vdf_bench: include vdf.h and define gcd_base_bits/gcd_128_max_iter
  unconditionally; only asm_arm_fallback_impl.inc stays under #if ARCH_ARM.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


if (a < b) {
std::swap(a, b);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM fallback swap does not track input reordering

Medium Severity

When a < b, the ARM fallback swaps them to satisfy gcd_unsigned's precondition (ab[0] >= ab[1]), but doesn't track that a swap occurred. After computing, results are stored back to data->a_2/data->a and data->b_2/data->b in their original positions, which is incorrect when a swap happened. Additionally, the UV matrices captured via gcd_unsigned_arm_uv_callback correspond to the swapped inputs rather than the original order, corrupting the transformation matrices used by callers.

Additional Locations (1)

Fix in Cursor Fix in Web

data->iter = iter;

if (data->out_uv_counter_addr) {
*data->out_uv_counter_addr = data->uv_counter_start + iter;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter value off-by-one in ARM fallback

Medium Severity

The ARM fallback sets *data->out_uv_counter_addr to data->uv_counter_start + iter, but the x86 assembly specification and test assertion both expect uv_counter_start + iter - 1. The comment in asm_gcd_unsigned.h states the counter is set to start + iter after each iteration, meaning after N iterations (indices 0 to N-1), the final counter is start + (N-1). The ARM version produces a value one higher than the x86 version, creating inconsistent cross-platform behavior that could affect synchronization or future code relying on the exact counter value.

Fix in Cursor Fix in Web

data->out_uv_addr[(iter - 1) * 8 + 5] = 1;
}

return 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM fallback missing error detection for internal failures

Medium Severity

When gcd_unsigned<gcd_size>() internally sets valid=false and falls back to the slow algorithm (which can happen when gcd_128 fails to make progress), the ARM fallback still returns 0 (success). The comment in gcd_unsigned.h line 221 states "can just make the gcd fail if this happens in the asm code" - indicating x86 asm returns an error in this case. However, the ARM fallback has no way to detect this internal failure, so it returns success with incomplete UV matrix data captured via callbacks, while iter only reflects iterations before the fallback occurred.

Fix in Cursor Fix in Web

@hoffmang9 hoffmang9 marked this pull request as draft January 31, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants