Skip to content

Conversation

@byjtew
Copy link
Collaborator

@byjtew byjtew commented Jan 29, 2024

Closes #278

@byjtew byjtew requested a review from anyzelman January 29, 2024 15:48
@byjtew byjtew changed the base branch from master to develop January 29, 2024 15:49
@anyzelman anyzelman added this to the v0.8 milestone Feb 5, 2024
@byjtew byjtew requested a review from anyzelman February 26, 2024 14:09
@anyzelman
Copy link
Member

Ready to merge

@GiovaGa GiovaGa force-pushed the 664-missing-assertions-for-threaded-execution branch from 48a6c1a to 775ca77 Compare October 22, 2025 07:36
@GiovaGa
Copy link
Collaborator

GiovaGa commented Oct 22, 2025

@anyzelman this seems solved, I don't know if we want a clearer error message to be outputted instead of a possibly cryptic assertion error.
I will run unit tests on the internal CI just to be sure, but should be good to go

@anyzelman anyzelman force-pushed the 664-missing-assertions-for-threaded-execution branch from 775ca77 to 11b57b6 Compare January 13, 2026 15:42
@anyzelman
Copy link
Member

I rebased again, reviewed it, and running smoke + unit tests with LPF now (in addition to CI).

Changes during review: I made the check a function of the Coordinates class, as otherwise it is not easy to call for a check without a lot of code repetition. I also modified the internal::getCoordinates for const-coordinates to perform the check. I did not think the modification to config::OMP was needed and reverted that, instead relying on its already-established threads function call.

There is one annoyance that in the use of the coordinates class, a callee tends to call a third class for creating the buffer, and then passes that into Coordinates. This means that the number of threads is actually controlled by this third class. The current design in this MR is hence not totally robust if somehow the number of threads spawned during a parallel section changes between the time the third party class code executes and allocates the buffer for some T1 #threads , and the time the #threads is actually recorded when the coordinates instance is set while some other T2 #threads is active.

… sequential and parallel contexts. Refined it a bit and trying again
…re retrieved (and the instance is then inspected to check whether it is valid or not-- which is valid use). The new thread assertion check mechanism now copes with that
anyzelman
anyzelman previously approved these changes Jan 14, 2026
@anyzelman
Copy link
Member

anyzelman commented Jan 14, 2026

Concept release notes (all unit tests with LPF still running on the latest commit):

Benjamin (@byjtew) noticed, once upon a time, that should users manually modify the number of threads available to OpenMP while the application is running, the application could crash without a clear error message. This issue recently resurfaced and this MR fixes at least some instances of this by implicitly checking if thread-counts remain the same whenever the sparse accumulator (SPA) of vectors or sparse matrices is retrieved. This check, however, has non-trivial overhead and is only active in debug mode. The end-user advice hence, as always, is to enable debug mode should their application crash for unclear reasons.

This MR also includes a performance improvement to advancing iterators over ALP/GraphBLAS vectors, which was done in the context of this MR as otherwise the debug-mode unit tests took too much time. However, this performance improvement should also benefit performance-mode code as it holistically speeds up iteration. It also includes minor code style fixes and internal documentation improvements.

The implementation of the check is constrained in that it assumes thread counts will not change between allocating SPA buffer data and assigning that buffer to a SPA. While the ALP internals guarantee that no such change occurs from within ALP, it is conceivable that such a run-time thread-count change could be concurrently initiated by a user thread, which in turn could be timed unfortunately so that this assumption is indeed violated. We consider this a rather adversarial case that would not be worth the implementation effort to guard against, but do note this caveat here in the commit notes.

Thanks also to @GiovaGa for recently revisiting this MR and again to Benjamin for flagging the issue and providing the initial fix, now finally merged.

… of the new check caused an issue for any test using debug-mode iterators over large vectors. This commit hence includes a performance fix (which should also benefit the performance of vector extraction in performance mode).
@anyzelman
Copy link
Member

All tests OK, merging

@anyzelman anyzelman merged commit 5357549 into develop Jan 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing assertions for threaded execution

4 participants