Skip to content

Conversation

@ChALkeR
Copy link

@ChALkeR ChALkeR commented Nov 28, 2025

Summary

See #1850

This assumes that rhs is canonical


There is another low-hanging fruit after this, which is result size calculation
+1 is really useless there and results in a lot of unoptimal operations

Test Plan

@meta-cla
Copy link

meta-cla bot commented Nov 28, 2025

Hi @ChALkeR!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@meta-cla
Copy link

meta-cla bot commented Nov 28, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 28, 2025
@tmikov
Copy link
Contributor

tmikov commented Nov 28, 2025

Can you please make this request against the static_h branch? The main branch is legacy Hermes and is frozen.

@ChALkeR ChALkeR force-pushed the chalker/perf/0/bigint-div branch from cf112db to c7b30da Compare November 28, 2025 22:59
@ChALkeR ChALkeR changed the base branch from main to static_h November 28, 2025 23:04
@ChALkeR
Copy link
Author

ChALkeR commented Nov 28, 2025

@tmikov Rebased

This PR only partially optimizes things, most of the improvement is elsewhere

@ChALkeR
Copy link
Author

ChALkeR commented Nov 28, 2025

Updated
This is now ~3x-10x faster for most division operations (and even faster for small / large division)

Will investigate more

@ChALkeR ChALkeR force-pushed the chalker/perf/0/bigint-div branch from 33bc747 to 3ba8ab7 Compare November 29, 2025 01:16
@ChALkeR
Copy link
Author

ChALkeR commented Nov 29, 2025

Ops/sec in thousands, 3408 means 3_408_000 divisions on an M3 (Release build)
N is 32-byte, s is 16-byte.

Before:

0 / 100 x 3710 ops/sec @ 269μs/op (0ns..1000μs)
0 % 100 x 3897 ops/sec @ 256μs/op (0ns..1999μs)
0 / N x 2874 ops/sec @ 347μs/op (0ns..1000μs)
0 % N x 2906 ops/sec @ 344μs/op (0ns..1000μs)

21 / 100 x 3408 ops/sec @ 293μs/op (0ns..1000μs)
21 % 100 x 3621 ops/sec @ 276μs/op (0ns..1000μs)
100 / 21 x 3437 ops/sec @ 290μs/op (0ns..1000μs)
100 % 21 x 3047 ops/sec @ 328μs/op (0ns..1000μs)

s / (s + 1) x 3559 ops/sec @ 280μs/op (0ns..1000μs)
s % (s + 1) x 3607 ops/sec @ 277μs/op (0ns..1000μs)
(s + 1) / s x 3272 ops/sec @ 305μs/op (0ns..4ms)
(s + 1) % s x 3217 ops/sec @ 310μs/op (0ns..1000μs)

s / N x 2920 ops/sec @ 342μs/op (0ns..1000μs)
s % N x 2959 ops/sec @ 337μs/op (0ns..1000μs)
N / s x 1031 ops/sec @ 969μs/op (0ns..2ms)
N % s x 1036 ops/sec @ 965μs/op (0ns..2ms)

1 / 64 x 3440 ops/sec @ 290μs/op (0ns..1000μs)
1 / N x 2969 ops/sec @ 336μs/op (0ns..1000μs)
64 / 1 x 3308 ops/sec @ 302μs/op (0ns..1000μs)
N / 1 x 659 ops/sec @ 1516μs/op (999μs..2ms)

2 / 64 x 3700 ops/sec @ 270μs/op (0ns..1000μs)
2 / N x 2936 ops/sec @ 340μs/op (0ns..1000μs)
64 / 2 x 3132 ops/sec @ 319μs/op (0ns..1000μs)
N / 2 x 684 ops/sec @ 1462μs/op (999μs..2ms)

Both fast path and size fix (this PR):

0 / 100 x 45351 ops/sec @ 22μs/op (0ns..1000μs)
0 % 100 x 51578 ops/sec @ 19μs/op (0ns..1000μs)
0 / N x 46462 ops/sec @ 21μs/op (0ns..1000μs)
0 % N x 50065 ops/sec @ 19μs/op (0ns..1000μs)

21 / 100 x 12044 ops/sec @ 83μs/op (0ns..1000μs)
21 % 100 x 12481 ops/sec @ 80μs/op (0ns..1000μs)
100 / 21 x 11375 ops/sec @ 87μs/op (0ns..1000μs)
100 % 21 x 12414 ops/sec @ 80μs/op (0ns..1000μs)

s / (s + 1) x 20412 ops/sec @ 48μs/op (0ns..1000μs)
s % (s + 1) x 20212 ops/sec @ 49μs/op (0ns..1000μs)
(s + 1) / s x 21130 ops/sec @ 47μs/op (0ns..1000μs)
(s + 1) % s x 20716 ops/sec @ 48μs/op (0ns..1000μs)

s / N x 46955 ops/sec @ 21μs/op (0ns..1000μs)
s % N x 36831 ops/sec @ 27μs/op (0ns..1000μs)
N / s x 1595 ops/sec @ 626μs/op (0ns..1000μs)
N % s x 1584 ops/sec @ 631μs/op (0ns..1000μs)

1 / 64 x 11580 ops/sec @ 86μs/op (0ns..1000μs)
1 / N x 45354 ops/sec @ 22μs/op (0ns..7ms)
64 / 1 x 11340 ops/sec @ 88μs/op (0ns..1000μs)
N / 1 x 844 ops/sec @ 1184μs/op (999μs..14ms)

2 / 64 x 11754 ops/sec @ 85μs/op (0ns..7ms)
2 / N x 46642 ops/sec @ 21μs/op (0ns..1000μs)
64 / 2 x 11125 ops/sec @ 89μs/op (0ns..1000μs)
N / 2 x 915 ops/sec @ 1092μs/op (999μs..2ms)

Just size fix (#1852):

0 / 100 x 12090 ops/sec @ 82μs/op (0ns..1000μs)
0 % 100 x 12148 ops/sec @ 82μs/op (0ns..5ms)
0 / N x 13027 ops/sec @ 76μs/op (0ns..1000μs)
0 % N x 13961 ops/sec @ 71μs/op (0ns..1000μs)

21 / 100 x 12117 ops/sec @ 82μs/op (0ns..1000μs)
21 % 100 x 12796 ops/sec @ 78μs/op (0ns..1000μs)
100 / 21 x 11611 ops/sec @ 86μs/op (0ns..4ms)
100 % 21 x 11698 ops/sec @ 85μs/op (0ns..2ms)

s / (s + 1) x 20154 ops/sec @ 49μs/op (0ns..1000μs)
s % (s + 1) x 20397 ops/sec @ 49μs/op (0ns..1000μs)
(s + 1) / s x 19873 ops/sec @ 50μs/op (0ns..1000μs)
(s + 1) % s x 21124 ops/sec @ 47μs/op (0ns..1000μs)

s / N x 13008 ops/sec @ 76μs/op (0ns..1000μs)
s % N x 13987 ops/sec @ 71μs/op (0ns..1000μs)
N / s x 1603 ops/sec @ 623μs/op (0ns..1000μs)
N % s x 1601 ops/sec @ 624μs/op (0ns..1000μs)

1 / 64 x 12476 ops/sec @ 80μs/op (0ns..1000μs)
1 / N x 12883 ops/sec @ 77μs/op (0ns..1000μs)
64 / 1 x 11541 ops/sec @ 86μs/op (0ns..1000μs)
N / 1 x 908 ops/sec @ 1101μs/op (999μs..2ms)

2 / 64 x 12249 ops/sec @ 81μs/op (0ns..1000μs)
2 / N x 12934 ops/sec @ 77μs/op (0ns..1000μs)
64 / 2 x 11827 ops/sec @ 84μs/op (0ns..1000μs)
N / 2 x 907 ops/sec @ 1102μs/op (999μs..2ms)

@ChALkeR
Copy link
Author

ChALkeR commented Nov 29, 2025

I will file a separate PR with just the size fix
Upd: #1852

@meta-codesync
Copy link

meta-codesync bot commented Nov 30, 2025

@tmikov has imported this pull request. If you are a Meta employee, you can view this in D88049977.

return OperationStatus::DIVISION_BY_ZERO;
}

if (lhs.numDigits < rhs.numDigits) {
Copy link
Author

@ChALkeR ChALkeR Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is incorrect on -(2n**63n)/(2n**63n)
I need to fix that

Copy link
Author

@ChALkeR ChALkeR Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way would be just do lhs.numDigits + 1 < rhs.numDigits in both places
And then consider improving the check in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to publish an updated one?

Copy link
Author

@ChALkeR ChALkeR Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fix
I recommend to go with #1852 first though

@facebook-github-bot
Copy link
Contributor

@ChALkeR has updated the pull request. You must reimport the pull request before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants