Skip to content

Native atomic operations without C dependency #26474

@RbPyer

Description

@RbPyer

Introduction

I have implemented native atomic operations for V using inline assembly. The implementation is currently available as a proof-of-concept module at https://github.com/RbPyer/v-atomics with comprehensive benchmarks available at https://github.com/RbPyer/v-atomics/tree/master/benchmarks

The module provides atomic primitives implemented directly in V without relying on C FFI or libatomic. Currently supported architectures are amd64 and i386. For i386, I deliberately chose to use MMX registers for 64-bit load/store operations despite MMX not being available on all i386 processors. This decision was made because MMX provides significant performance benefits for atomic load and store operations on i386. For processors without MMX support, a fallback to cmpxchg8b could be added in the future, but this would come with a performance penalty.

The implementation includes the following atomic operations for i32, i64, u32, and u64 types: atomic load, atomic store, atomic add (returning the new value after addition), atomic swap (returning the old value), and compare-and-swap. All operations currently provide sequentially consistent semantics with full memory barriers.

The module has been tested extensively with unit tests covering all operations and includes proper alignment checking. However, due to some complexities when building with the -prod flag, likely related to O2/O3 optimizations and current compiler specifics, the behavior on unaligned memory access differs between build modes: with -prod flag, execution immediately hits ud2 instruction (invalid opcode trap), while without -prod it properly calls panic. This is an area that needs further investigation and improvement in the compiler.

Benchmarks demonstrate that the native implementation performs comparably to C-based atomics while eliminating the dependency on libatomic. Performance measurements show that native V atomics achieve equivalent or better performance compared to the current C FFI implementation across different operations and contention scenarios.

Motivation and Benefits

Currently V implements atomic operations through calls to C functions, which requires linking against libatomic and maintaining FFI bindings. This creates several problems. First, it adds a mandatory dependency on the C toolchain and platform-specific C headers. Second, it makes the actual implementation opaque to V developers and harder to reason about. Third, it complicates cross-compilation because you need the appropriate C toolchain for each target platform.

Native atomic operations would solve these issues by making atomics a first-class feature of V itself. This is particularly important for systems programming use cases where developers want to avoid C dependencies entirely. It would also benefit scenarios like building lock-free data structures, implementing concurrent algorithms without mutexes, and writing high-performance multithreaded applications where understanding the exact memory ordering guarantees is critical.

Having atomics implemented in V also serves an educational purpose by making low-level concurrency primitives visible and understandable to V programmers.

The key benefit of integrating this into V would be removing the dependency on C's libatomic library without sacrificing performance. This would make V more self-contained and simplify cross-compilation scenarios where setting up the C toolchain can be cumbersome.

Proposed Solution

I propose integrating native atomic operations into V's standard library, either as part of the existing sync module or as a new sync.atomic submodule. The proof-of-concept implementation at https://github.com/RbPyer/v-atomics demonstrates that this is technically feasible and performs well.

The implementation strategy uses per-architecture V files (like atomics.amd64.v and atomics.i386.v) containing inline assembly for each platform. This approach gives precise control over the emitted instructions and allows for architecture-specific optimizations. Benchmarks included in the repository at https://github.com/RbPyer/v-atomics/tree/master/benchmarks show that performance is equivalent to C-based atomics, so there would be no performance regression from switching to native implementation.

The repository includes comprehensive unit tests and benchmarks demonstrating correctness and performance characteristics.

Future Development Roadmap

Future development roadmap includes several phases:

Phase 1: Extended atomic operations

  • Add atomic bitwise operations: AND, OR, XOR (commonly needed for lock-free algorithms)
  • Implement atomic operations on pointers for atomic reference counting and pointer-based concurrent data structures
  • Add proper atomic wrapper types that encapsulate the memory location and provide method-based APIs

Phase 2: Memory ordering support

  • Add support for different memory orderings beyond the current sequentially consistent default
  • Implement relaxed, acquire, release, and acquire-release orderings (similar to C11, C++11, and Rust)
  • This would allow performance optimizations in scenarios where full sequential consistency is not required

Phase 3: Additional architecture support

  • ARM (32-bit and 64-bit)
  • RISC-V
  • Additional platforms as needed
  • Each architecture would get its own implementation file with appropriate inline assembly

Phase 4: i386 compatibility improvements

  • Add fallback path using cmpxchg8b for 64-bit operations on non-MMX processors
  • Implement runtime CPU feature detection to choose between MMX and cmpxchg8b code paths
  • This would ensure compatibility with older i386 processors at the cost of some performance

The implementation follows similar patterns to atomic operations in Go's sync/atomic and Rust's std::sync::atomic, providing a familiar API for developers coming from those languages.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

Version used

V 0.5.0 bcb8260

Environment details (OS name and version, etc.)

|V full version      |V 0.5.0 ffc6eaf7865638ee3120befe85bc8059f0cfe3b5
|:-------------------|:-------------------
|OS                  |linux, "EndeavourOS Linux"
|Processor           |32 cpus, 64bit, little endian, AMD Ryzen 9 9950X3D 16-Core Processor
|Memory              |35.98GB/60.37GB
|                    |
|V executable        |/home/oswyndel/v/v
|V last modified time|2026-01-18 20:50:30
|                    |
|V home dir          |OK, value: /home/oswyndel/v
|VMODULES            |OK, value: /home/oswyndel/.vmodules
|VTMP                |OK, value: /tmp/v_1000
|                    |
|Git version         |git version 2.52.0
|V git status        |ffc6eaf7
|.git/config present |true
|                    |
|cc version          |cc (GCC) 15.2.1 20260103
|gcc version         |gcc (GCC) 15.2.1 20260103
|clang version       |clang version 21.1.6
|tcc version         |tcc version 0.9.28rc 2025-02-13 HEAD@f8bd136d (x86_64 Linux)
|tcc git status      |thirdparty-linux-amd64 696c1d84
|emcc version        |N/A
|glibc version       |ldd (GNU libc) 2.42

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions