-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Describe the bug
release_tls_block() and release_tls_block_chain() in the IOBuf TLS block caching layer do not guard against a block being returned to TLS when it is already the TLS list head. This can create a self-referencing cycle (b->portal_next == b), causing any subsequent traversal of the TLS chain — such as remove_tls_block_chain() (registered via thread_atexit) or share_tls_block() — to loop infinitely, hanging the thread permanently.
In src/butil/iobuf_inl.h, release_tls_block():
When b is already tls_data->block_head, the assignment b->u.portal_next = tls_data->block_head becomes b->u.portal_next = b, forming a single-node cycle.
Similarly, in src/butil/iobuf.cpp, release_tls_block_chain():
If the chain being returned contains blocks that overlap with the existing TLS head, last_b->portal_next can point back to first_b (which may be last_b itself), again forming an infinite cycle.
How the Double-Return Happens
IOBufAsZeroCopyOutputStream::BackUp() calls iobuf::release_tls_block(_cur_block) to eagerly return the block to TLS so other code can reuse it:
After BackUp(), the block is now tls_data.block_head. If a subsequent operation (e.g., _release_block() during destruction of IOBufAsZeroCopyOutputStream, or a BackUp in IOBufAsSnappySink) calls release_tls_block() again with the same block pointer (obtained from a still-live BlockRef), the block is returned a second time — triggering the self-loop.
Impact
- Thread hangs permanently in remove_tls_block_chain() (called at thread exit via thread_atexit), or in share_tls_block() / release_tls_block_chain() during normal I/O.
- The hang is silent — no crash, no log, no error — making it extremely difficult to diagnose in production.
- Any brpc application using protobuf serialization over IOBuf (which internally uses IOBufAsZeroCopyOutputStream) is potentially affected.
To Reproduce
Expected behavior
Versions
OS:
Compiler:
brpc:
protobuf:
Additional context/screenshots
** Suggested Fix **
- Guard release_tls_block() against double-return
- Guard release_tls_block_chain() against self-loop after linking
