Skip to content

DJ824/spsc-queue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

this is a single producer single consumer lock free queue

implementation is based on a ring buffer with certain performance optimizations:

  • keep the head and tail in different cache lines, without it, we know that only 1 core can have exlusive ownership of the entire cache line, this causes ping ponging back and forth between the 2 cores, as they would keep having to transfer ownership, invalidate the old state, update new state, and so forth

  • cache head and tail indices to reduce cache trafficking. in a standard enqueue op, we first need to check if the queue full, we do this be checking if the tail + 1 slot is the same as the head. when the producer core accesses the head, the state is stored in the shared cache. however, when a consumer tries to dequeue, it needs to hold the state of the head in its exclusive cache, as it means to write to that location. so, when this happens, one core holds an exclusive state, and the other holds a shared state, when the exclusive state is modified, the cpu has to update this state to all cores that also have access, whether it is shared or exclusive, which takes many clock cycles. to fix this, we store the state of the head and tail in temporary caches, and only update them when the queue is full/empty

example) head = 0, tail = 4 consumer dequeues -> tail_cache = 4, so we can keep dequeing until the head == tail_cache, and then we can check to see if there are more items to dequeue by updating the tail cache, without this, we would be doing tail.load() every deqeue operation,

  • another optimization technique is to add padding at the front and end of the queue to avoid false sharing with any adjacent memory allocations. the padding size is determined by how many queue slots fit into a single cache line, this avoids the head and tail from being in the same cache line as any other allocation your program might do, based on the original location the queue was allocated

performance: image

using perf, we can see how aligning the head and tail + caches greatly improves our performance, doubled throughput and reduced cache misses by over half

image

ref: https://rigtorp.se/ringbuffer/

About

spsc lock-free queue

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages