E.g., in nonblocking/blas1.hpp:553 . It should instead rely on the internal global buffer for such things, as otherwise it clashes with performance semantics (which typically prefer no system calls may be made, including allocations) while the current solution is also not using NUMA-aware allocation (which the global buffer does use).