Improve sqrt with SoftFloat-style lookup table and integer arithmetic#1331
Improve sqrt with SoftFloat-style lookup table and integer arithmetic#1331justinzhuguangwen wants to merge 32 commits intoboostorg:developfrom
Conversation
Convert decimal to double, use std::sqrt, convert back. Added baseline, benchmarks, and build scripts. Updated .gitignore.
- Add decimal lookup table for 1/sqrt(x) approximation - Implement SoftFloat-style remainder elimination - Add safety checks to prevent inf/NaN results - Maintain C++14 compatibility with out-of-line definitions Performance: ~4-5x speedup over baseline implementation
Replace Padé approximation with 32-entry lookup table approach, reducing Newton-Raphson iterations from 3-5 to 1-3 based on precision. - Add sqrt lookup tables for [0.1, 1.0) range with interpolation - Reduce iterations: decimal32 (1), decimal64 (2), decimal128 (3) - Maintain existing normalization and exponent handling Performance: Fewer divisions per sqrt call with faster convergence. Known issues: Some precision gaps remain, particularly for decimal32. Further optimization needed for interpolation and edge cases.
Replace Padé approximation with 1/sqrt(x) lookup table and linear interpolation, reducing Newton iterations from 3-5 to 1-2.
Replace Padé + Newton-Raphson with decimal-native lookup table algorithm. Uses 128-entry table with linear interpolation for initial approximation, then remainder elimination refinement using multiplication only (no division). Reduces iterations from 3-5 to 1-3. Experimental: precision tuning may be needed.
…nd sqrt optimization
…nd remove sqrt table
… remainder mapping
…red tables, C++14 tag dispatch
- Replace 128-entry table with 90-entry table (step 0.1) - Use integer remainder for decimal32 (uint64) and decimal64 (uint128) - Add 1/2/4 Newton iterations for decimal32/64/128 - Add generate_sqrt_tables.py for table generation - Update decimal_float_and_sqrt.md with algorithm details
- Rewrite approx_recip_sqrt32/64 to match SoftFloat algorithm - Use working scale and Newton iterations to avoid precision loss - Switch sqrt32_impl and sqrt64_impl to integer approx_recip_sqrt - Fix sqrt32 and sqrt64 test failures
- approx_recip_sqrt32/64: rewrite with integer-only Newton iterations - sqrt32_impl: use approx_recip_sqrt32 for integer-based sqrt - sqrt64_impl: use approx_recip_sqrt64 with __int128 when available - sqrt64_impl: fallback path for platforms without __int128 (MSVC, 32-bit) - sqrt128_impl: u256-based integer sqrt with frexp10 and round-to-nearest Performance (ops/s): - decimal32_t: 1.63M -> 9.55M (5.86x) - decimal64_t: 0.74M -> 4.39M (5.97x) - decimal128_t: 0.31M -> 0.75M (2.41x) - decimal_fast32_t: 1.81M -> 8.63M (4.76x) - decimal_fast64_t: 0.80M -> 3.74M (4.69x) - decimal_fast128_t: 0.22M -> 0.69M (3.13x) All sqrt tests pass for decimal32/64/128 and sqrt64 fallback path.
- sqrt_tables.hpp -> sqrt_lookup.hpp - approx_recip_sqrt.hpp -> approx_recip_sqrt_impl.hpp - remove approx_recip_sqrt_1 - approxRecipSqrt_1k0s/k1s -> recip_sqrt_k0s/k1s
- Add lookup table algorithm for sqrt optimization - decimal32/64/128: 4.72x, 5.66x, 2.85x speedup respectively - Remove temporary test files
- Add lookup table algorithm for sqrt optimization - decimal32/64/128: 4.72x, 5.66x, 2.85x speedup respectively - Remove temporary test files
|
An automated preview of the documentation is available at https://1331.decimal.prtest3.cppalliance.org/libs/decimal/doc/html/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-02-05 06:36:02 UTC |
|
Hi @justinzhuguangwen I just approved your workflow run. In this repo, first-time contributors require workflow run approval. Looks like you're getting good perf results now. CI will be interesting. You're getting hit by some compiler warnings. Decimal runs high warning levels with So don't be initially shocked if you get a lot of failing runs in the first try. Usually, these matters can be cleared up in a few trials and everything gets OK. Our CI is setup to run faster runs on GHA and a whole bunch of slow tests on Drone. Some of the drone tests already failed on the pesky In fact, I just got hit by Cc: @mborland |
|
Hi @ckormanyos, thanks for approving the workflow and for the heads-up about the strict warning levels. I've pushed a fix for the I've started the workflow on my fork to verify the fix. Once it passes, I'll trigger a re-run here if needed. Thanks again for the guidance. |
ckormanyos
left a comment
There was a problem hiding this comment.
You might prefer:
std::uint64_t base_sig = static_cast<std::uint64_t>((static_cast<unsigned int>(index) + 10U) * UINT64_C(100000000000000));
I hope this is syntactically correct and lets the compiler do unsigned * u64 -> u64 multiply
Nice. Oh also, when we commit to a branch having a running PR with a running CI/CD, Matt (@mborland) has set it up to halt the running CI/CD and restart it. This is done both on GHA as well as on Drone. So it is not uncommon for us to commit, re-commmit, and so on until it gets right. You don't need to be, let's say, ultra-conservative about that. |
|
@justinzhuguangwen thank you for your amazing algorithmic and coding work. It looks like this thing is going to go green soon. Hi Matt (@mborland) would you like to look over this work regarding matters of style and Boost/Decimal consistency? Upon going green, I'd like to get this into 1.91. |
|
Hi @ckormanyos, thanks for the kind words and for the review. The CI checks triggered by the latest commit have all passed. Happy to address any follow-up if needed. Note: This PR has many commits from incremental updates. When merging, I'd suggest using squash merge and using the PR description as the squash commit message to keep the history clean. Thanks again for the guidance. |
mborland
left a comment
There was a problem hiding this comment.
This looks really good! My comments are really around other potential points of optimization, and I think they could apply a few different places in the code. Let me know if you have questions.
|
CI has been started for these next improvements. |
|
Hi Justin @justinzhuguangwen, I think a trivial warning has crept in. One of the CI logs shows: It's annayong, ... we know, but probably worth a lot of quality in the end. |
Fixed. Running CI in my fork to confirm. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #1331 +/- ##
=========================================
- Coverage 98.8% 98.8% -0.0%
=========================================
Files 278 282 +4
Lines 18034 18185 +151
Branches 1918 1916 -2
=========================================
+ Hits 17812 17950 +138
- Misses 222 235 +13
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
- Add decimal_fast32_t, decimal128_t test_sqrt_edge, decimal_fast128_t - Add perfect squares (sqrt(4), sqrt(9)) to cover rem==0 branch - Add non-perfect squares (sqrt(2), sqrt(5)) to cover Newton correction - Add dense sampling [1.01, 9.99] to exercise rem<0 and other edge paths
Description
Summary
Replace Padé + Newton-Raphson sqrt with a SoftFloat-inspired lookup table and integer-only Newton refinement. All arithmetic stays in integers until final conversion, reducing rounding errors and improving throughput.
Changes
New files:
doc/decimal_float_and_sqrt.md: Documentation on decimal vs binary float representation, bit layout, and sqrt algorithminclude/boost/decimal/detail/cmath/impl/sqrt_lookup.hpp: 90-entry k0/k1 lookup tableinclude/boost/decimal/detail/cmath/impl/approx_recip_sqrt_impl.hpp:approx_recip_sqrt32/approx_recip_sqrt64, integer-only 1/√x approximation via table lookup + Newton refinementinclude/boost/decimal/detail/cmath/impl/sqrt32_impl.hpp: Integer sqrt for decimal32 usingapprox_recip_sqrt32and Newton correctioninclude/boost/decimal/detail/cmath/impl/sqrt64_impl.hpp: Integer sqrt for decimal64 usingapprox_recip_sqrt64and Newton correction; fallback path for targets without__int128(MSVC, 32-bit)include/boost/decimal/detail/cmath/impl/sqrt128_impl.hpp: u256-based integer sqrt for decimal128,frexp10for exact significand, round-to-nearestModified files:
sqrt.hpp: Switched to new impl headers, replacing Padé + Newton-Raphson implementationtest/test_cmath.cpp: Updated sqrt-related teststest/test_sqrt.cpp: Updated sqrt-related testsPerformance (sqrt_bench.py, baseline vs current)
Note:
sqrt_bench.pyand./run_srqt_test.share not part of this PR. To run them, use commitd54af195e45f1207c7d55fcdb26f5890d9aafbbd.Tests
./run_srqt_test.shpasses at commitd54af195e45f1207c7d55fcdb26f5890d9aafbbd(decimal32, decimal64, decimal128, github_issue_1107, github_issue_1110).Documentation
doc/decimal_float_and_sqrt.md: New document describing current algorithm and implementation detailsRefs #1311