[fix][broker] Clean up orphan ledger on concurrent initial schema creation in BookkeeperSchemaStorage by geniusjoe · Pull Request #25514 · apache/pulsar

geniusjoe · 2026-04-13T14:29:48Z

Motivation

When multiple requests concurrently create a schema for a brand-new topic (i.e., the schema locator z-node does not yet exist), each request first creates a BookKeeper ledger via addNewSchemaEntryToStore, then attempts to create the schema locator z-node via CAS (createSchemaLocator with expectedVersion = -1L).

Only one request succeeds; the others fail with BadVersionException (because ZK MetadataStore translates NODEEXISTS to BadVersionException when expectedVersion == -1). However, the ledgers created by the failed requests were never cleaned up, resulting in orphan ledgers (dirty data) in BookKeeper.

Note that the existing updateSchemaLocator method already has cleanup logic for this scenario (deleting the orphan ledger when CAS fails with BadVersionException) in #18701, but the createNewSchema method — which handles the initial schema creation path — was missing this cleanup.

Modifications

BookkeeperSchemaStorage.createNewSchema: Added a whenComplete callback after createSchemaLocator. When the CAS operation fails due to AlreadyExistsException or BadVersionException, the orphan BookKeeper ledger is asynchronously deleted, consistent with the existing cleanup logic in updateSchemaLocator. The method is also refactored into three clearly commented steps for better readability.
PulsarMockLedgerHandle: Added a new constructor that accepts Map<String, byte[]> customMetadata and passes it to LedgerMetadataBuilder.withCustomMetadata(), so that mock ledgers can retain custom metadata set during creation.
PulsarMockBookKeeper: Updated asyncCreateLedger to forward the properties parameter to the new PulsarMockLedgerHandle constructor, enabling tests to inspect ledger custom metadata.
SchemaTest: Added testConcurrentCreateSchemaNoOrphanLedger test that verifies orphan ledgers are cleaned up when 16 producers concurrently create schema on a brand-new topic. The test inspects surviving BK ledgers via PulsarMockBookKeeper.getLedgerMap() and asserts that only 1 ledger with matching pulsar/schemaId custom metadata exists.

Verifying this change

This change added tests and can be verified as follows:

Added testConcurrentCreateSchemaNoOrphanLedger in SchemaTest that concurrently creates 16 producers with the same AVRO schema on a brand-new topic, then verifies:
1. Only 1 schema version exists via admin.schemas().getAllSchemas()
2. Only 1 surviving BK ledger has customMetadata["pulsar/schemaId"] matching the topic's schema name (orphan ledgers from failed concurrent creations were deleted)

Does this pull request potentially affect one of the following parts:

…ation in BookkeeperSchemaStorage

[fix][broker] Clean up orphan ledger on concurrent initial schema cre…

ea61669

…ation in BookkeeperSchemaStorage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker] Clean up orphan ledger on concurrent initial schema creation in BookkeeperSchemaStorage#25514

[fix][broker] Clean up orphan ledger on concurrent initial schema creation in BookkeeperSchemaStorage#25514
geniusjoe wants to merge 1 commit intoapache:masterfrom
geniusjoe:bugfix/schema-concurrent-creation

geniusjoe commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

geniusjoe commented Apr 13, 2026

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant