Skip to content

Conversation

@Rexicon226
Copy link
Contributor

While poking around the sha256 impl, I noticed some bad codegen around the schedule updates. GCC 15 and LLVM 21 seem to miss re-ordering the schedule updates higher for Zen 5 (LLVM still just aliases the numbers of Zen 4), and thus we miss on a bit of ILP.
With the benchmark provided in test_sha256.c, this increases the performance by around 1m hashes / second on a Ryzen 5 9600X.

LLVM 21.1.8:

NOTICE  01-02 02:07:37.666619 114269 f0   0    src/ballet/sha256/test_sha256.c(175): ~39.028 M poh hashes / sec / core with fd_sha256_hash_32_repeated
NOTICE  01-02 02:07:37.692786 114269 f0   0    src/ballet/sha256/test_sha256.c(185): ~38.221 M poh hashes / sec / core with fd_sha256_hash_32_repeated_old

GCC 15.2:

NOTICE  01-02 14:21:55.726875 679095 f0   0    src/ballet/sha256/test_sha256.c(175): ~39.057 M poh hashes / sec / core with fd_sha256_hash_32_repeated
NOTICE  01-02 14:21:55.753294 679095 f0   0    src/ballet/sha256/test_sha256.c(185): ~37.861 M poh hashes / sec / core with fd_sha256_hash_32_repeated_old

(also adds the missing FOUR_ROUNDS undefine, it was accidentally removed cuz it was a typo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant