-
Notifications
You must be signed in to change notification settings - Fork 78
Port optimized VDF GCD path to ARM (aarch64/arm64) #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add ARM C++ fallback for gcd_unsigned via asm_arm_fallback_impl.inc - Dispatch to ARM fallback in threading.h when ARCH_ARM; declare in asm_main.h - Define ARCH_ARM for __aarch64__ in parameters.h - Guard x86-only code: get_time_cycles (rdtsc) and x86intrin in vdf.h - Fix mpz_cmp_si callers in threading.h to use _() for GMP pointer - Makefile: detect arm64/aarch64, use C++ fallback instead of x86 asm; add Homebrew paths on macOS - Skip TEST_ASM verification blocks on ARM in gcd_unsigned.h, gcd_128.h, gcd_base_continued_fractions.h - Include ARM fallback .inc from vdf_client, prover_test, 1weso_test, 2weso_test, vdf_bench - README: add Running tests section (C++ and Python, ARM note)
- gcd_unsigned_arm_uv_callback: make thread_local in gcd_unsigned.h and asm_arm_fallback_impl.inc so concurrent threads do not overwrite it. - vdf_bench: include vdf.h and define gcd_base_bits/gcd_128_max_iter unconditionally; only asm_arm_fallback_impl.inc stays under #if ARCH_ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| if (a < b) { | ||
| std::swap(a, b); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM fallback swap does not track input reordering
Medium Severity
When a < b, the ARM fallback swaps them to satisfy gcd_unsigned's precondition (ab[0] >= ab[1]), but doesn't track that a swap occurred. After computing, results are stored back to data->a_2/data->a and data->b_2/data->b in their original positions, which is incorrect when a swap happened. Additionally, the UV matrices captured via gcd_unsigned_arm_uv_callback correspond to the swapped inputs rather than the original order, corrupting the transformation matrices used by callers.
Additional Locations (1)
src/asm_arm_fallback_impl.inc
Outdated
| data->iter = iter; | ||
|
|
||
| if (data->out_uv_counter_addr) { | ||
| *data->out_uv_counter_addr = data->uv_counter_start + iter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter value off-by-one in ARM fallback
Medium Severity
The ARM fallback sets *data->out_uv_counter_addr to data->uv_counter_start + iter, but the x86 assembly specification and test assertion both expect uv_counter_start + iter - 1. The comment in asm_gcd_unsigned.h states the counter is set to start + iter after each iteration, meaning after N iterations (indices 0 to N-1), the final counter is start + (N-1). The ARM version produces a value one higher than the x86 version, creating inconsistent cross-platform behavior that could affect synchronization or future code relying on the exact counter value.
| data->out_uv_addr[(iter - 1) * 8 + 5] = 1; | ||
| } | ||
|
|
||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM fallback missing error detection for internal failures
Medium Severity
When gcd_unsigned<gcd_size>() internally sets valid=false and falls back to the slow algorithm (which can happen when gcd_128 fails to make progress), the ARM fallback still returns 0 (success). The comment in gcd_unsigned.h line 221 states "can just make the gcd fail if this happens in the asm code" - indicating x86 asm returns an error in this case. However, the ARM fallback has no way to detect this internal failure, so it returns success with incomplete UV matrix data captured via callbacks, while iter only reflects iterations before the fallback occurred.


Note
Medium Risk
Touches the fast-path GCD implementation and platform-specific dispatch/build logic, which can affect correctness/performance on ARM and potentially alter behavior if
ARCH_*detection or linking is wrong.Overview
Adds an ARM (aarch64/arm64) C++ fallback for the optimized
gcd_unsignedpath by implementingasm_arm_func_gcd_unsignedinasm_arm_fallback_impl.incand dispatching to it fromthreading.hwhenARCH_ARMis set.Updates build and portability:
Makefile.vdf-clientnow detects ARM vs x86 (dropping x86 asm objects on ARM and adding macOS Homebrew include/lib hints),ARCH_ARMis defined for__aarch64__, and x86-only includes/rdtsctiming are guarded; ASM self-test blocks are skipped on ARM and binaries (vdf_client, tests,vdf_bench) include the ARM fallback implementation.Written by Cursor Bugbot for commit eae35d1. This will update automatically on new commits. Configure here.