Skip to content

Conversation

@ever4Kenny
Copy link

@ever4Kenny ever4Kenny commented Jan 9, 2026

What changes were proposed in this pull request?

This PR converts the static initialization of columnar shuffle class constructors
and skew shuffle method to lazy initialization using the initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.

Specifically, the following changes were made:

  1. Introduced ColumnarHashBasedShuffleWriterConstructorHolder static inner class
    to lazily initialize the constructor for ColumnarHashBasedShuffleWriter

  2. Introduced ColumnarShuffleReaderConstructorHolder static inner class to lazily
    initialize the constructor for CelebornColumnarShuffleReader

  3. Introduced CelebornSkewShuffleMethodHolder static inner class to lazily
    initialize the isCelebornSkewedShuffle method reference

  4. Modified createColumnarHashBasedShuffleWriter(), createColumnarShuffleReader(),
    and isCelebornSkewShuffleOrChildShuffle() methods to use the holder pattern for
    lazy initialization

  5. Added JavaDoc comments explaining the lazy loading mechanism

Why are the changes needed?

The current implementation statically initializes columnar shuffle class constructors
and the skew shuffle method at SparkUtils class loading time, which means these
classes/methods are loaded regardless of whether they are actually used.

This lazy loading approach ensures that:

  • Columnar shuffle classes are only loaded when actually needed (when
    celeborn.columnarShuffle.enabled is true and the create methods are called)
  • CelebornShuffleState class is only loaded when skew shuffle detection is needed
  • Reduces unnecessary class loading overhead for users not using these features
  • Improves startup performance and memory footprint
  • Aligns with the conditional usage pattern already present in SparkShuffleManager

The static holder pattern (initialization-on-demand holder idiom) provides several
advantages:

  • Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism)
  • No synchronization overhead at runtime (no volatile reads or lock acquisition)
  • Simpler and more concise code compared to double-checked locking
  • Recommended by Effective Java (Item 83) for lazy initialization

Does this PR resolve a correctness bug?

No, this is a performance optimization.

Does this PR introduce any user-facing change?

No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.

How was this patch tested?

  • Code review to verify correct implementation of the initialization-on-demand holder pattern
  • Verified that JVM class loading guarantees thread safety
  • The changes are backward compatible and don't alter functionality, only initialization timing

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the initialization of optional columnar shuffle features by converting static initialization to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern). This ensures that columnar shuffle classes and skew shuffle detection logic are only loaded when actually needed, reducing unnecessary class loading overhead for users not utilizing these features.

Changes:

  • Introduced three static inner holder classes to lazily initialize columnar shuffle constructors and skew shuffle method
  • Refactored createColumnarHashBasedShuffleWriter(), createColumnarShuffleReader(), and isCelebornSkewShuffleOrChildShuffle() to use the holder pattern
  • Added comprehensive JavaDoc comments explaining the lazy loading mechanism

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nd skew shuffle method using static holder pattern

### What changes were proposed in this pull request?

This PR converts the static initialization of columnar shuffle class constructors
and skew shuffle method to lazy initialization using the initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.

Specifically, the following changes were made:

1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner class
   to lazily initialize the constructor for ColumnarHashBasedShuffleWriter

2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to lazily
   initialize the constructor for CelebornColumnarShuffleReader

3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily
   initialize the `isCelebornSkewedShuffle` method reference

4. Modified `createColumnarHashBasedShuffleWriter()`, `createColumnarShuffleReader()`,
   and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder pattern for
   lazy initialization

5. Added JavaDoc comments explaining the lazy loading mechanism

### Why are the changes needed?

The current implementation statically initializes columnar shuffle class constructors
and the skew shuffle method at SparkUtils class loading time, which means these
classes/methods are loaded regardless of whether they are actually used.

This lazy loading approach ensures that:
- Columnar shuffle classes are only loaded when actually needed (when
  `celeborn.columnarShuffle.enabled` is true and the create methods are called)
- CelebornShuffleState class is only loaded when skew shuffle detection is needed
- Reduces unnecessary class loading overhead for users not using these features
- Improves startup performance and memory footprint
- Aligns with the conditional usage pattern already present in SparkShuffleManager

The static holder pattern (initialization-on-demand holder idiom) provides several
advantages:
- Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism)
- No synchronization overhead at runtime (no volatile reads or lock acquisition)
- Simpler and more concise code compared to double-checked locking
- Recommended by Effective Java (Item 83) for lazy initialization

### Does this PR resolve a correctness bug?

No, this is a performance optimization.

### Does this PR introduce any user-facing change?

No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.

### How was this patch tested?

- Code review to verify correct implementation of the initialization-on-demand holder pattern
- Verified that JVM class loading guarantees thread safety (JLS §12.4.2)
- Analyzed existing columnar shuffle and skew shuffle test coverage in the codebase
- The changes are backward compatible and don't alter functionality, only initialization timing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant