Improve sqrt with SoftFloat-style lookup table and integer arithmetic by justinzhuguangwen · Pull Request #1331 · boostorg/decimal

justinzhuguangwen · 2026-02-04T03:40:43Z

Description

Summary

Replace Padé + Newton-Raphson sqrt with a SoftFloat-inspired lookup table and integer-only Newton refinement. All arithmetic stays in integers until final conversion, reducing rounding errors and improving throughput.

Changes

New files:

doc/decimal_float_and_sqrt.md: Documentation on decimal vs binary float representation, bit layout, and sqrt algorithm
include/boost/decimal/detail/cmath/impl/sqrt_lookup.hpp: 90-entry k0/k1 lookup table
include/boost/decimal/detail/cmath/impl/approx_recip_sqrt_impl.hpp: approx_recip_sqrt32 / approx_recip_sqrt64, integer-only 1/√x approximation via table lookup + Newton refinement
include/boost/decimal/detail/cmath/impl/sqrt32_impl.hpp: Integer sqrt for decimal32 using approx_recip_sqrt32 and Newton correction
include/boost/decimal/detail/cmath/impl/sqrt64_impl.hpp: Integer sqrt for decimal64 using approx_recip_sqrt64 and Newton correction; fallback path for targets without __int128 (MSVC, 32-bit)
include/boost/decimal/detail/cmath/impl/sqrt128_impl.hpp: u256-based integer sqrt for decimal128, frexp10 for exact significand, round-to-nearest

Modified files:

sqrt.hpp: Switched to new impl headers, replacing Padé + Newton-Raphson implementation
test/test_cmath.cpp: Updated sqrt-related tests
test/test_sqrt.cpp: Updated sqrt-related tests

Performance (sqrt_bench.py, baseline vs current)

Type	Baseline	Current	Speedup
decimal32_t	1.59M ops/s	8.90M ops/s	✓ 5.59x (+458.9%)
decimal64_t	0.78M ops/s	4.17M ops/s	✓ 5.33x (+432.9%)
decimal128_t	0.24M ops/s	0.77M ops/s	✓ 3.15x (+214.8%)
decimal_fast32_t	1.80M ops/s	8.51M ops/s	✓ 4.72x (+372.3%)
decimal_fast64_t	0.81M ops/s	3.74M ops/s	✓ 4.64x (+363.9%)
decimal_fast128_t	0.22M ops/s	0.72M ops/s	✓ 3.21x (+221.2%)

Note: sqrt_bench.py and ./run_srqt_test.sh are not part of this PR. To run them, use commit d54af195e45f1207c7d55fcdb26f5890d9aafbbd.

Tests

CI covers sqrt-related tests. For local runs, ./run_srqt_test.sh passes at commit d54af195e45f1207c7d55fcdb26f5890d9aafbbd (decimal32, decimal64, decimal128, github_issue_1107, github_issue_1110).

Documentation

doc/decimal_float_and_sqrt.md: New document describing current algorithm and implementation details

Refs #1311

Convert decimal to double, use std::sqrt, convert back. Added baseline, benchmarks, and build scripts. Updated .gitignore.

- Add decimal lookup table for 1/sqrt(x) approximation - Implement SoftFloat-style remainder elimination - Add safety checks to prevent inf/NaN results - Maintain C++14 compatibility with out-of-line definitions Performance: ~4-5x speedup over baseline implementation

Replace Padé approximation with 32-entry lookup table approach, reducing Newton-Raphson iterations from 3-5 to 1-3 based on precision. - Add sqrt lookup tables for [0.1, 1.0) range with interpolation - Reduce iterations: decimal32 (1), decimal64 (2), decimal128 (3) - Maintain existing normalization and exponent handling Performance: Fewer divisions per sqrt call with faster convergence. Known issues: Some precision gaps remain, particularly for decimal32. Further optimization needed for interpolation and edge cases.

Replace Padé approximation with 1/sqrt(x) lookup table and linear interpolation, reducing Newton iterations from 3-5 to 1-2.

Replace Padé + Newton-Raphson with decimal-native lookup table algorithm. Uses 128-entry table with linear interpolation for initial approximation, then remainder elimination refinement using multiplication only (no division). Reduces iterations from 3-5 to 1-3. Experimental: precision tuning may be needed.

…nd sqrt optimization

…nd remove sqrt table

… remainder mapping

…red tables, C++14 tag dispatch

- Replace 128-entry table with 90-entry table (step 0.1) - Use integer remainder for decimal32 (uint64) and decimal64 (uint128) - Add 1/2/4 Newton iterations for decimal32/64/128 - Add generate_sqrt_tables.py for table generation - Update decimal_float_and_sqrt.md with algorithm details

- Rewrite approx_recip_sqrt32/64 to match SoftFloat algorithm - Use working scale and Newton iterations to avoid precision loss - Switch sqrt32_impl and sqrt64_impl to integer approx_recip_sqrt - Fix sqrt32 and sqrt64 test failures

- approx_recip_sqrt32/64: rewrite with integer-only Newton iterations - sqrt32_impl: use approx_recip_sqrt32 for integer-based sqrt - sqrt64_impl: use approx_recip_sqrt64 with __int128 when available - sqrt64_impl: fallback path for platforms without __int128 (MSVC, 32-bit) - sqrt128_impl: u256-based integer sqrt with frexp10 and round-to-nearest Performance (ops/s): - decimal32_t: 1.63M -> 9.55M (5.86x) - decimal64_t: 0.74M -> 4.39M (5.97x) - decimal128_t: 0.31M -> 0.75M (2.41x) - decimal_fast32_t: 1.81M -> 8.63M (4.76x) - decimal_fast64_t: 0.80M -> 3.74M (4.69x) - decimal_fast128_t: 0.22M -> 0.69M (3.13x) All sqrt tests pass for decimal32/64/128 and sqrt64 fallback path.

- sqrt_tables.hpp -> sqrt_lookup.hpp - approx_recip_sqrt.hpp -> approx_recip_sqrt_impl.hpp - remove approx_recip_sqrt_1 - approxRecipSqrt_1k0s/k1s -> recip_sqrt_k0s/k1s

- Add lookup table algorithm for sqrt optimization - decimal32/64/128: 4.72x, 5.66x, 2.85x speedup respectively - Remove temporary test files

cppalliance-bot · 2026-02-04T03:47:06Z

An automated preview of the documentation is available at https://1331.decimal.prtest3.cppalliance.org/libs/decimal/doc/html/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-02-05 06:36:02 UTC

ckormanyos · 2026-02-04T08:39:39Z

Hi @justinzhuguangwen I just approved your workflow run. In this repo, first-time contributors require workflow run approval.

Looks like you're getting good perf results now. CI will be interesting.

You're getting hit by some compiler warnings. Decimal runs high warning levels with -Werror. I should have warned you a bit more on this up front.

So don't be initially shocked if you get a lot of failing runs in the first try. Usually, these matters can be cleared up in a few trials and everything gets OK.

Our CI is setup to run faster runs on GHA and a whole bunch of slow tests on Drone. Some of the drone tests already failed on the pesky -Werror. But again, these matters can usually be cleared up with a few edits.

In fact, I just got hit by -Werror on my recent frexp PR too.

Cc: @mborland

justinzhuguangwen · 2026-02-04T08:54:18Z

Hi @ckormanyos, thanks for approving the workflow and for the heads-up about the strict warning levels.

I've pushed a fix for the -Wsign-conversion error in approx_recip_sqrt64 (in approx_recip_sqrt_impl.hpp). The change is in the latest commit on this PR.

I've started the workflow on my fork to verify the fix. Once it passes, I'll trigger a re-run here if needed.

Thanks again for the guidance.

ckormanyos

You might prefer:

std::uint64_t base_sig = static_cast<std::uint64_t>((static_cast<unsigned int>(index) + 10U) * UINT64_C(100000000000000));

I hope this is syntactically correct and lets the compiler do unsigned * u64 -> u64 multiply

ckormanyos · 2026-02-04T09:46:15Z

I've started the workflow on my fork to verify the fix. Once it passes, I'll trigger a re-run here if needed.

Nice.

Oh also, when we commit to a branch having a running PR with a running CI/CD, Matt (@mborland) has set it up to halt the running CI/CD and restart it. This is done both on GHA as well as on Drone.

So it is not uncommon for us to commit, re-commmit, and so on until it gets right. You don't need to be, let's say, ultra-conservative about that.

ckormanyos · 2026-02-04T13:19:47Z

@justinzhuguangwen thank you for your amazing algorithmic and coding work. It looks like this thing is going to go green soon.

Hi Matt (@mborland) would you like to look over this work regarding matters of style and Boost/Decimal consistency?

Upon going green, I'd like to get this into 1.91.

justinzhuguangwen · 2026-02-04T14:13:11Z

Hi @ckormanyos, thanks for the kind words and for the review.

The CI checks triggered by the latest commit have all passed. Happy to address any follow-up if needed.

Note: This PR has many commits from incremental updates. When merging, I'd suggest using squash merge and using the PR description as the squash commit message to keep the history clean.

Thanks again for the guidance.

mborland

This looks really good! My comments are really around other potential points of optimization, and I think they could apply a few different places in the code. Let me know if you have questions.

doc/decimal_float_and_sqrt.md

…_impl

ckormanyos · 2026-02-04T17:11:59Z

CI has been started for these next improvements.

ckormanyos · 2026-02-04T18:02:59Z

Hi Justin @justinzhuguangwen, I think a trivial warning has crept in.

One of the CI logs shows:

2026-02-04T17:16:24.2135712Z ./boost/decimal/detail/cmath/impl/sqrt128_impl.hpp:72:29: error: unused variable 'scale15' [-Werror,-Wunused-variable]
2026-02-04T17:16:24.2136834Z     constexpr std::uint64_t scale15 = 1000000000000000ULL;    // 10^15
2026-02-04T17:16:24.2137441Z                             ^
2026-02-04T17:16:24.2137816Z 1 error generated.

It's annayong, ... we know, but probably worth a lot of quality in the end.

justinzhuguangwen · 2026-02-05T00:04:06Z

Hi Justin @justinzhuguangwen, I think a trivial warning has crept in.

One of the CI logs shows:
2026-02-04T17:16:24.2135712Z ./boost/decimal/detail/cmath/impl/sqrt128_impl.hpp:72:29: error: unused variable 'scale15' [-Werror,-Wunused-variable]
2026-02-04T17:16:24.2136834Z     constexpr std::uint64_t scale15 = 1000000000000000ULL;    // 10^15
2026-02-04T17:16:24.2137441Z                             ^
2026-02-04T17:16:24.2137816Z 1 error generated.
It's annayong, ... we know, but probably worth a lot of quality in the end.

Fixed. Running CI in my fork to confirm.

…plicate

codecov · 2026-02-05T06:13:06Z

Codecov Report

❌ Patch coverage is 92.70833% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.8%. Comparing base (e72fd34) to head (b1f37c5).

Files with missing lines	Patch %	Lines
...e/boost/decimal/detail/cmath/impl/sqrt128_impl.hpp	91.4%	6 Missing ⚠️
...de/boost/decimal/detail/cmath/impl/sqrt32_impl.hpp	86.7%	4 Missing ⚠️
...de/boost/decimal/detail/cmath/impl/sqrt64_impl.hpp	82.7%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##           develop   #1331     +/-   ##
=========================================
- Coverage     98.8%   98.8%   -0.0%     
=========================================
  Files          278     282      +4     
  Lines        18034   18185    +151     
  Branches      1918    1916      -2     
=========================================
+ Hits         17812   17950    +138     
- Misses         222     235     +13

Files with missing lines	Coverage Δ
...cimal/detail/cmath/impl/approx_recip_sqrt_impl.hpp	`100.0% <100.0%> (ø)`
include/boost/decimal/detail/cmath/sqrt.hpp	`100.0% <100.0%> (ø)`
test/test_cmath.cpp	`100.0% <ø> (ø)`
test/test_sqrt.cpp	`100.0% <100.0%> (ø)`
...de/boost/decimal/detail/cmath/impl/sqrt32_impl.hpp	`86.7% <86.7%> (ø)`
...de/boost/decimal/detail/cmath/impl/sqrt64_impl.hpp	`82.7% <82.7%> (ø)`
...e/boost/decimal/detail/cmath/impl/sqrt128_impl.hpp	`91.4% <91.4%> (ø)`

... and 4 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e72fd34...b1f37c5. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add decimal_fast32_t, decimal128_t test_sqrt_edge, decimal_fast128_t - Add perfect squares (sqrt(4), sqrt(9)) to cover rem==0 branch - Add non-perfect squares (sqrt(2), sqrt(5)) to cover Newton correction - Add dense sampling [1.01, 9.99] to exercise rem<0 and other edge paths

justinzhu@home added 23 commits January 25, 2026 03:05

Optimize sqrt using hardware binary floating-point sqrt

9963866

Convert decimal to double, use std::sqrt, convert back. Added baseline, benchmarks, and build scripts. Updated .gitignore.

fix: remove unused file

1b8b6cb

Optimize sqrt() using SoftFloat-style lookup table

06d6d09

Replace Padé approximation with 1/sqrt(x) lookup table and linear interpolation, reducing Newton iterations from 3-5 to 1-2.

fix : delete commit message

cacc4c5

docs(doc): add decimal vs binary float representation and bit layout

ca437bd

perf(cmath): sqrt table + remainder elimination; doc: decimal float a…

6406099

…nd sqrt optimization

fix(sqrt): use gx×recip_sqrt for initial value per doc; correct doc a…

c52ee49

…nd remove sqrt table

fix(docs): remove unused doc

f28ab2c

feat(sqrt): use SoftFloat-style two-table (k0/k1), 128 bins, 2^e→10^e…

e1d6233

… remainder mapping

feat(doc):rebuild

6b4b1ff

refactor(sqrt): split into SoftFloat-style sqrt32/64/128 impl and sha…

70e81d8

…red tables, C++14 tag dispatch

refactor(sqrt): rename files, drop dead code, use snake_case

969d8b4

- sqrt_tables.hpp -> sqrt_lookup.hpp - approx_recip_sqrt.hpp -> approx_recip_sqrt_impl.hpp - remove approx_recip_sqrt_1 - approxRecipSqrt_1k0s/k1s -> recip_sqrt_k0s/k1s

Merge upstream/develop, keep our sqrt implementation

d54af19

perf(sqrt): optimize sqrt and remove test files for PR

9bb1345

- Add lookup table algorithm for sqrt optimization - decimal32/64/128: 4.72x, 5.66x, 2.85x speedup respectively - Remove temporary test files

perf(sqrt): optimize sqrt and remove test files for PR

883987e

- Add lookup table algorithm for sqrt optimization - decimal32/64/128: 4.72x, 5.66x, 2.85x speedup respectively - Remove temporary test files

chore: trigger CI

63e2cb2

fix(sqrt128): remove unused scale15 variable

608be71

Fix -Wsign-conversion in approx_recip_sqrt64

9bf8402

ckormanyos reviewed Feb 4, 2026

View reviewed changes

fix(sqrt): resolve MSVC C4189 unused scale16 in sqrt64_impl

5284cb4

mborland reviewed Feb 4, 2026

View reviewed changes

doc/decimal_float_and_sqrt.md Outdated Show resolved Hide resolved

doc/decimal_float_and_sqrt.md Outdated Show resolved Hide resolved

justinzhu@home added 2 commits February 5, 2026 00:40

refactor(sqrt): use portable int128 and umul256, remove sqrt64 fallback

a65e20e

fix(cmath): restore scale16 constant accidentally removed from sqrt64…

702da16

…_impl

fix(cmath): remove unused variable scale15 in sqrt128_impl.hpp

fd56afc

zhuguangwen added 3 commits February 5, 2026 10:14

perf(sqrt128): use 64×64→128 for scale33; docs: fix accuracy and dedu…

585567f

…plicate

docs(sqrt): add performance benchmark section

8feb583

docs(sqrt): add performance benchmark section

b1f37c5

Conversation

justinzhuguangwen commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Changes

Performance (sqrt_bench.py, baseline vs current)

Tests

Documentation

Uh oh!

cppalliance-bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ckormanyos commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinzhuguangwen commented Feb 4, 2026

Uh oh!

ckormanyos left a comment

Choose a reason for hiding this comment

Uh oh!

ckormanyos commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ckormanyos commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinzhuguangwen commented Feb 4, 2026

Uh oh!

mborland left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ckormanyos commented Feb 4, 2026

Uh oh!

ckormanyos commented Feb 4, 2026

Uh oh!

justinzhuguangwen commented Feb 5, 2026

Uh oh!

codecov bot commented Feb 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justinzhuguangwen commented Feb 4, 2026 •

edited

Loading

cppalliance-bot commented Feb 4, 2026 •

edited

Loading

ckormanyos commented Feb 4, 2026 •

edited

Loading

ckormanyos commented Feb 4, 2026 •

edited

Loading

ckormanyos commented Feb 4, 2026 •

edited

Loading