-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
hi,
seeing the last digit of 32-bit and 64-bit computations slow,
and would like to use for 128-bit too,
and as speedups are always welcome ...
Is it a feasible/sensible idea to split the value into smaller chunks
in a first step? For example, into a maximum of 9 digits in order
to remain within 32-bit arithmetic?
And let them write into adequate memory locations , the smaller
magnitudes "left zero padded", to form a complete string in the end.
As most machines today are "vectorized", perhaps it would also
be possible to convert these in parallel, since they are independent?
E.g.:
340,282,366,920,938,463,463,374,607,431,768,211,455 split in:
340 | 282,366,920 | 938,463,463 | 374,607,431 | 768,211,455
v v v v v
p-4 proc-3 proc-2 proc-1 proc-0
v v v v v
"340" | "282366920" | "938463463" | "374607431" | "768211455"
Or compute a value fitting as "similar length", in above example 7/8/8/8/8:
340,282,366,920,938,463,463,374,607,431,768,211,455 split in:
3,402,823 | 66,920,938 | 46,346,337 | 46,074,317 | 68,211,455
v v v v v
proc-4 proc-3 proc-2 proc-1 proc-0
v v v v v
"3402823" | "66920938" | "46346337" | "46074317" | "68211455"
Metadata
Metadata
Assignees
Labels
No labels