Skip to content

Commit 77447b0

Browse files
zhenghuanclaude
andcommitted
[CELEBORN-2248] Implement lazy loading for columnar shuffle classes and skew shuffle method using static holder pattern
### What changes were proposed in this pull request? This PR converts the static initialization of columnar shuffle class constructors and skew shuffle method to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern) in SparkUtils.java. Specifically, the following changes were made: 1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner class to lazily initialize the constructor for ColumnarHashBasedShuffleWriter 2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to lazily initialize the constructor for CelebornColumnarShuffleReader 3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily initialize the `isCelebornSkewedShuffle` method reference 4. Modified `createColumnarHashBasedShuffleWriter()`, `createColumnarShuffleReader()`, and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder pattern for lazy initialization 5. Added JavaDoc comments explaining the lazy loading mechanism ### Why are the changes needed? The current implementation statically initializes columnar shuffle class constructors and the skew shuffle method at SparkUtils class loading time, which means these classes/methods are loaded regardless of whether they are actually used. This lazy loading approach ensures that: - Columnar shuffle classes are only loaded when actually needed (when `celeborn.columnarShuffle.enabled` is true and the create methods are called) - CelebornShuffleState class is only loaded when skew shuffle detection is needed - Reduces unnecessary class loading overhead for users not using these features - Improves startup performance and memory footprint - Aligns with the conditional usage pattern already present in SparkShuffleManager The static holder pattern (initialization-on-demand holder idiom) provides several advantages: - Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism) - No synchronization overhead at runtime (no volatile reads or lock acquisition) - Simpler and more concise code compared to double-checked locking - Recommended by Effective Java (Item 83) for lazy initialization ### Does this PR resolve a correctness bug? No, this is a performance optimization. ### Does this PR introduce any user-facing change? No. This change only affects when certain classes are loaded internally. The functionality and API remain unchanged. ### How was this patch tested? - Code review to verify correct implementation of the initialization-on-demand holder pattern - Verified that JVM class loading guarantees thread safety (JLS §12.4.2) - Analyzed existing columnar shuffle and skew shuffle test coverage in the codebase - The changes are backward compatible and don't alter functionality, only initialization timing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
1 parent d30c02e commit 77447b0

File tree

1 file changed

+66
-47
lines changed
  • client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn

1 file changed

+66
-47
lines changed

client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java

Lines changed: 66 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -228,17 +228,25 @@ public static <K, C> ShuffleReader<K, C> getReader(
228228

229229
public static final String COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CLASS =
230230
"org.apache.spark.shuffle.celeborn.ColumnarHashBasedShuffleWriter";
231-
static final DynConstructors.Builder COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CONSTRUCTOR_BUILDER =
232-
DynConstructors.builder()
233-
.impl(
234-
COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CLASS,
235-
int.class,
236-
CelebornShuffleHandle.class,
237-
TaskContext.class,
238-
CelebornConf.class,
239-
ShuffleClient.class,
240-
ShuffleWriteMetricsReporter.class,
241-
SendBufferPool.class);
231+
232+
/**
233+
* Lazy holder for ColumnarHashBasedShuffleWriter constructor. The constructor is initialized only
234+
* when this class is first accessed, ensuring lazy loading without explicit synchronization.
235+
*/
236+
private static class ColumnarHashBasedShuffleWriterConstructorHolder {
237+
private static final DynConstructors.Ctor<HashBasedShuffleWriter> INSTANCE =
238+
DynConstructors.builder()
239+
.impl(
240+
COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CLASS,
241+
int.class,
242+
CelebornShuffleHandle.class,
243+
TaskContext.class,
244+
CelebornConf.class,
245+
ShuffleClient.class,
246+
ShuffleWriteMetricsReporter.class,
247+
SendBufferPool.class)
248+
.build();
249+
}
242250

243251
public static <K, V, C> HashBasedShuffleWriter<K, V, C> createColumnarHashBasedShuffleWriter(
244252
int shuffleId,
@@ -248,26 +256,33 @@ public static <K, V, C> HashBasedShuffleWriter<K, V, C> createColumnarHashBasedS
248256
ShuffleClient client,
249257
ShuffleWriteMetricsReporter metrics,
250258
SendBufferPool sendBufferPool) {
251-
return COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CONSTRUCTOR_BUILDER
252-
.build()
253-
.invoke(null, shuffleId, handle, taskContext, conf, client, metrics, sendBufferPool);
259+
return ColumnarHashBasedShuffleWriterConstructorHolder.INSTANCE.invoke(
260+
null, shuffleId, handle, taskContext, conf, client, metrics, sendBufferPool);
254261
}
255262

256263
public static final String COLUMNAR_SHUFFLE_READER_CLASS =
257264
"org.apache.spark.shuffle.celeborn.CelebornColumnarShuffleReader";
258-
static final DynConstructors.Builder COLUMNAR_SHUFFLE_READER_CONSTRUCTOR_BUILDER =
259-
DynConstructors.builder()
260-
.impl(
261-
COLUMNAR_SHUFFLE_READER_CLASS,
262-
CelebornShuffleHandle.class,
263-
int.class,
264-
int.class,
265-
int.class,
266-
int.class,
267-
TaskContext.class,
268-
CelebornConf.class,
269-
ShuffleReadMetricsReporter.class,
270-
ExecutorShuffleIdTracker.class);
265+
266+
/**
267+
* Lazy holder for CelebornColumnarShuffleReader constructor. The constructor is initialized only
268+
* when this class is first accessed, ensuring lazy loading without explicit synchronization.
269+
*/
270+
private static class ColumnarShuffleReaderConstructorHolder {
271+
private static final DynConstructors.Ctor<CelebornShuffleReader> INSTANCE =
272+
DynConstructors.builder()
273+
.impl(
274+
COLUMNAR_SHUFFLE_READER_CLASS,
275+
CelebornShuffleHandle.class,
276+
int.class,
277+
int.class,
278+
int.class,
279+
int.class,
280+
TaskContext.class,
281+
CelebornConf.class,
282+
ShuffleReadMetricsReporter.class,
283+
ExecutorShuffleIdTracker.class)
284+
.build();
285+
}
271286

272287
public static <K, C> CelebornShuffleReader<K, C> createColumnarShuffleReader(
273288
CelebornShuffleHandle<K, ?, C> handle,
@@ -279,19 +294,17 @@ public static <K, C> CelebornShuffleReader<K, C> createColumnarShuffleReader(
279294
CelebornConf conf,
280295
ShuffleReadMetricsReporter metrics,
281296
ExecutorShuffleIdTracker shuffleIdTracker) {
282-
return COLUMNAR_SHUFFLE_READER_CONSTRUCTOR_BUILDER
283-
.build()
284-
.invoke(
285-
null,
286-
handle,
287-
startPartition,
288-
endPartition,
289-
startMapIndex,
290-
endMapIndex,
291-
context,
292-
conf,
293-
metrics,
294-
shuffleIdTracker);
297+
return ColumnarShuffleReaderConstructorHolder.INSTANCE.invoke(
298+
null,
299+
handle,
300+
startPartition,
301+
endPartition,
302+
startMapIndex,
303+
endMapIndex,
304+
context,
305+
conf,
306+
metrics,
307+
shuffleIdTracker);
295308
}
296309

297310
// Added in SPARK-32920, for Spark 3.2 and above
@@ -541,15 +554,21 @@ public static void addSparkListener(SparkListener listener) {
541554
}
542555
}
543556

544-
private static final DynMethods.UnboundMethod isCelebornSkewShuffle_METHOD =
545-
DynMethods.builder("isCelebornSkewedShuffle")
546-
.hiddenImpl("org.apache.spark.celeborn.CelebornShuffleState", Integer.TYPE)
547-
.orNoop()
548-
.build();
557+
/**
558+
* Lazy holder for isCelebornSkewedShuffle method. The method is initialized only when this class
559+
* is first accessed, ensuring lazy loading without explicit synchronization.
560+
*/
561+
private static class CelebornSkewShuffleMethodHolder {
562+
private static final DynMethods.UnboundMethod INSTANCE =
563+
DynMethods.builder("isCelebornSkewedShuffle")
564+
.hiddenImpl("org.apache.spark.celeborn.CelebornShuffleState", Integer.TYPE)
565+
.orNoop()
566+
.build();
567+
}
549568

550569
public static boolean isCelebornSkewShuffleOrChildShuffle(int appShuffleId) {
551-
if (!isCelebornSkewShuffle_METHOD.isNoop()) {
552-
return isCelebornSkewShuffle_METHOD.asStatic().invoke(appShuffleId);
570+
if (!CelebornSkewShuffleMethodHolder.INSTANCE.isNoop()) {
571+
return CelebornSkewShuffleMethodHolder.INSTANCE.asStatic().invoke(appShuffleId);
553572
} else {
554573
return false;
555574
}

0 commit comments

Comments
 (0)