Move CRC32C specific compilation options to only impact crc-impl.c++#6078
Move CRC32C specific compilation options to only impact crc-impl.c++#6078FlorentCollin wants to merge 1 commit intomainfrom
Conversation
|
Thank you for investigating this – I'll need to check if the conclusions here are correct, but that does sound plausible.
|
|
The best approach to avoid compiler-flag dependent behavior might be to always use CRC32C, since we'll be hashing at most 8 bytes we should be able to get acceptable performance with a small lookup table when hardware support is unavailable. |
|
A small correction: When c7ad54b was added, crc-impl.c++ was part of the io target which had the ISA extension flags added. Therefore optimizations were being applied correctly to that file, although they are not being applied now that we have a separate target, another thing to fix. |
|
Thank you for the incredibly quick response and for opening a pull request to implement the CRC32C algorithm.
I'll write a test on Monday that show the If your PR removes the foot gun, would it be reasonable for the workerd project to add the flags directly to the Something along the lines of diff --git a/build/wd_cc_library.bzl b/build/wd_cc_library.bzl
index 04cf4a9c1..6045f4db0 100644
--- a/build/wd_cc_library.bzl
+++ b/build/wd_cc_library.bzl
@@ -1,15 +1,20 @@
"""wd_cc_library definition"""
load("@rules_cc//cc:cc_library.bzl", "cc_library")
-def wd_cc_library(strip_include_prefix = "/src", **kwargs):
+def wd_cc_library(strip_include_prefix = "/src", copts = [], **kwargs):
"""Wrapper for cc_library that sets common attributes
"""
cc_library(
target_compatible_with = select({
"@//build/config:no_build": ["@platforms//:incompatible"],
"//conditions:default": [],
}),
strip_include_prefix = strip_include_prefix,
+ copts = select({
+ "@platforms//cpu:aarch64": ["-mcrc"],
+ "@platforms//cpu:x86_64": ["-msse4.2"],
+ "//conditions:default": [],
+ }) + copts,
**kwargs
) |
|
Yikes. Good find. It looks like this doesn't affect our production build because we use IMO we should probably set -march flags globally on workerd as well. We might not be able to set them as aggressively since we don't know what hardware people are using but we can probably do better than the default. Failing that I think we really need to set the CRC flags globally, at least. I don't want some translation units using hardware hashing while others use software -- there's no benefit to that. |
While investigating a D1 issue, I found that two translation units compile different implementations of
kj::_::intHash32(), which causes a bug insrc/workerd/api/sql.c++statement cache.Disclaimer: I used Claude Opus 4.6 to help me understand all the moving parts and to analyze the workerd binaries. I have checked everything it claims, but I'm not an expert in these sorts of low-level optimizations, so if anything is wrong please correct me so that I can learn along the way!
Description
This PR tries to fix these two issues:
Both issues report that executing large SQL files on Linux and Windows fails with:
However, the same command executes successfully on macOS ARM.
Investigation
After some testing, I found that the error only appears for SQL files with over 1 MB worth of SQL statements. This led me to the
SQL_STATEMENT_CACHE_MAX_SIZEconstantworkerd/src/workerd/api/sql.c++
Line 18 in 244803a
and the code that evicts statements from the cache.
workerd/src/workerd/api/sql.c++
Lines 84 to 92 in 244803a
Specifically, it is the call to
statementCache.map.eraseMatch(oldQuery)that produces the "HashIndex detected hash table inconsistency" error.The SQL statement cache uses a
kj::Tablewith akj::HashIndexworkerd/src/workerd/api/sql.h
Line 130 in 244803a
findOrCreate(JsString)callsJsString::hashCode()(defined injsvalue.c++)workerd/src/workerd/api/sql.c++
Lines 44 to 48 in 244803a
HashableV8Ref::hashCode()cached in theCachedStatementconstructor (defined insql.c++)workerd/src/workerd/api/sql.c++
Lines 87 to 90 in 244803a
The problem is a One Definition Rule violation:
kj::_::intHash32()is aninlinefunction inkj/hash.hthat compiles to different implementations depending on whether CRC32 compiler flags are set:sql.c++is compiled as part of//src/workerd/io, which has-mcrc(ARM) /-msse4.2(x86)copts, sointHash32()uses hardware CRC32.jsvalue.c++is compiled as part of//src/workerd/jsg:jsg-core, which has no CRC flags, sointHash32()uses the Thomas Wang software fallback.Both functions take the same input and produce different outputs. Entries are inserted into one bucket (Thomas Wang hash) but looked up during eviction from a different bucket (CRC32 hash).
Thanks to the amazing logging in the
kj::Table, thekj::HashIndexdetects the mismatch and logs the "hash table inconsistency" error.I'm not entirely sure, but I think that macOS ARM is unaffected because the default compiler flags already enable
__ARM_FEATURE_CRC32for all targets.I had Claude disassemble the relevant functions in the workerd binary distributed with the wrangler package, and it confirmed the mismatch in the Linux x86_64 binary:
I believe the compilation flag mismatch was introduced in c7ad54b. It seems the
coptswere intended forcrypto/crc-impl.c++, but that file is compiled under a separate bazel target (//src/workerd/api:crypto-crc-impl) that never received them. The flags ended up applying tosql.c++and every other file in//src/workerd/io.Changes
I'm not sure how to best fix the problem. This PR removes the CRC32 compilation options from
//src/workerd/ioand adds them to the//src/workerd/api:crypto-crc-impltarget. However, this means that the CRC32 optimization isn't used anymore in//src/workerd/io, and that's kind of sad.We could also remove the call to
kj::hashCodein theidentityHashimplementation, but Kenton wrote that it's not clear if V8'sGetIdentityHash()returns uniform results.workerd/src/workerd/jsg/jsg.h
Lines 959 to 966 in 3274cc9
What's missing from this PR / Questions
Is there a way to create a test?
I wanted to create a test to reproduce this, but the
hash table inconsistencyis only a log and doesn't crash or throw.What would be the best fix for this problem?
I'm not used to the codebase, so the change introduced in this PR is certainly not the best one. It does solve the problem, but it would be great to solve the problem and still use the crc32 compilation option for all the codebase.