Revert 4.0.11 threads normal-mutex debug implementation#26635
Revert 4.0.11 threads normal-mutex debug implementation#26635slowriot wants to merge 10 commits intoemscripten-core:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reverts Emscripten’s 4.0.11-era debug behavior changes for PTHREAD_MUTEX_NORMAL in musl, restoring the pre-4.0.11 fast-path semantics to address the reported allocator regression under -pthread + -sWASM_WORKERS with assertions/debug-enabled system libraries.
Changes:
- Revert musl pthread normal-mutex debug behavior to always use the historical fast path (and remove the debug deadlock assertion path).
- Disable
test_pthread_mutex_deadlockwhile preserving the underlying test source for potential future re-enable. - Add a browser regression test for the allocator crash and update pthreads codesize baselines.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
system/lib/libc/musl/src/thread/pthread_mutex_trylock.c |
Restores fast-path behavior for normal mutex trylock in debug builds. |
system/lib/libc/musl/src/thread/pthread_mutex_lock.c |
Restores fast-path behavior for normal mutex lock in debug builds. |
system/lib/libc/musl/src/thread/pthread_mutex_timedlock.c |
Removes Emscripten debug deadlock assertion and restores normal fast-path. |
test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp |
Adds a wasm-worker allocator churn repro/regression test. |
test/test_browser.py |
Wires the new browser regression test into the wasm worker test suite. |
test/test_other.py |
Disables the deadlock test while documenting the rollback intent. |
test/other/test_pthread_mutex_deadlock.c |
Updates comments to reflect that deadlock-assert behavior is reverted/disabled. |
test/codesize/test_codesize_minimal_pthreads.json |
Updates expected code size baselines after the revert. |
test/codesize/test_codesize_minimal_pthreads_memgrowth.json |
Updates expected code size baselines (memgrowth variant). |
AUTHORS |
Adds a new author entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static void main_loop() { | ||
| static unsigned ticks; | ||
| static bool reported; | ||
| new std::uint8_t{0}; |
There was a problem hiding this comment.
The the bug still reproduce if you convert this from .cpp to .c and use malloc and free?
(You might need to add --no-builtin maybe ?)
There was a problem hiding this comment.
Yes, this test is basically equivalent to the CPP and C repros I provided in the issue description at #26619. Only the C++ path is tested here because I understand them to be identical. --no-builtin shouldn't be necessary as the tests are built with -O0 by default which should prevent those calls being optimised away. Let me know if you'd like me to add a pure C test as well just to be belt-and-braces about it, but I wouldn't normally bother because from previous stack traces (as per the issue report) we can see that malloc and free are always the culprit calls behind the scenes when using new and delete.
There was a problem hiding this comment.
I would rather have just the C version of the test. We tend to prefer C tests for various reasons.
There was a problem hiding this comment.
@sbc100 C++ test replaced with the C equivalent in the latest version. Please let me know if you'd like any other changes.
There was a problem hiding this comment.
Are you able to reproduce the issue using that C version on the main branch. For me, it doesn't reproduce with the C version on main.
There was a problem hiding this comment.
@sbc100 just catching up on those now, will confirm fully shortly, but it seems that my original repros detailed in #26619 (comment) no longer crash on main, at least. It'll take me a while to test against the full version of my prod code, but at least those two small code samples now run cleanly.
There was a problem hiding this comment.
Can can still repro the crash with the C++ version on main:
$ ./emcc -g test.cc -pthread -sWASM_WORKERS && node ./a.out.js
start
i=0
i=1000000
i=2000000
i=3000000
i=4000000
i=5000000
i=6000000
i=7000000
i=8000000
i=9000000
i=10000000
i=11000000
i=12000000
i=13000000
i=14000000
i=15000000
Aborted(native code called abort())
/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:169
throw toThrow;
^
RuntimeError: Aborted(native code called abort())
at abort (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:887:11)
at __abort_js (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:1880:7)
at a.out.wasm.abort (wasm://wasm/a.out.wasm-001433aa:wasm-function[230]:0x657b)
at a.out.wasm.emscripten_builtin_malloc (wasm://wasm/a.out.wasm-001433aa:wasm-function[238]:0x7719)
at a.out.wasm.operator_new_impl(unsigned long) (wasm://wasm/a.out.wasm-001433aa:wasm-function[260]:0x9b26)
at a.out.wasm.operator new(unsigned long) (wasm://wasm/a.out.wasm-001433aa:wasm-function[259]:0x9afb)
at a.out.wasm.main::$_1::operator()() const (wasm://wasm/a.out.wasm-001433aa:wasm-function[31]:0xa47)
at a.out.wasm.main::$_1::__invoke() (wasm://wasm/a.out.wasm-001433aa:wasm-function[29]:0x94d)
at callUserCallback (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:1539:16)
at Object.runIter (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:2251:9)
Node.js v22.22.0
There was a problem hiding this comment.
#include <cstdint>
#include <cstdio>
#include <emscripten.h>
#include <emscripten/wasm_worker.h>
auto main() -> int {
emscripten_wasm_worker_post_function_v(emscripten_malloc_wasm_worker(1024 * 1024), []{
printf("start\n");
for (int i = 0; i < 70'000'000; ++i) {
if ((i % 1000000) == 0) {
printf("i=%d\n", i);
}
delete new uint8_t{0};
}
printf("done\n");
});
emscripten_set_main_loop([]{new uint8_t{0};}, 0, false);
return EXIT_SUCCESS;
}…://github.com/slowriot/emscripten into revert-pthread-mutex-debug-deadlock-detection
| static void main_loop(void) { | ||
| static unsigned ticks; | ||
| malloc(1); | ||
| if (++ticks == 120) { |
There was a problem hiding this comment.
I had to add emscripten_terminate_all_wasm_workers there to make the test actually return.
There was a problem hiding this comment.
Updated to use your C repro in latest (in case this branch is still potentially useful)
test/test_browser.py
Outdated
|
|
||
| def test_wasm_worker_pthread_mutex_debug_allocator_regression(self): | ||
| self.btest_exit('wasm_worker/pthread_mutex_debug_allocator_regression.c', | ||
| cflags=['-pthread', '-sWASM_WORKERS']) |
There was a problem hiding this comment.
Can you move this into test_other.py? I had to add -sEXIT_RUNTIME and emscripten_terminate_all_wasm_workers to make it work.
There was a problem hiding this comment.
Done in latest (in case this branch is still potentially useful)
test/test_browser.py
Outdated
| # Test that code does not crash in ASSERTIONS-disabled builds | ||
| self.btest('wasm_worker/proxied_function.c', expected='0', cflags=['--js-library', test_file('wasm_worker/proxied_function.js'), '-sWASM_WORKERS', '-sASSERTIONS=0']) | ||
|
|
||
| def test_wasm_worker_pthread_mutex_debug_allocator_regression(self): |
There was a problem hiding this comment.
Can you add a link to https://github.com/emscripten-core/emscripten/issues/26619 here?
(And also in the test source code).
There was a problem hiding this comment.
Done in latest (in case this branch is still potentially useful)
Resolves #26619 by reverting #24607
For motivation, please see description and discussion at #26622.
This PR simply reverts the changes between 4.0.10 and 4.0.11 around debug behaviour for
PTHREAD_MUTEX_NORMAL.This also disables the
test_pthread_mutex_deadlocktest which was testing this functionality, but preserves the code and test setup so it can be re-enabled in future if needed.Finally, this PR adds a test to detect the regression, as in #26622. The test passes with this fix.