-
Notifications
You must be signed in to change notification settings - Fork 486
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Runtime code generation is essential for achieving high performance in data processing systems. Instead of using reflection or generic implementations, generated code can:
- Eliminate virtual method dispatch - Direct field access and type-specific operations
- Enable JIT optimization - Generated code can be better optimized by the JVM
- Reduce boxing/unboxing overhead - Type-specific code avoids primitive boxing
- Support complex type comparisons - Nested types (arrays, maps, rows) require specialized comparison logic
The initial use case is RecordEqualiser for comparing InternalRow instances, which is needed for:
- Change data capture (CDC) deduplication
- Aggregation state management
- Primary key table updates
Solution
Core Framework
| Component | Description |
|---|---|
CodeGeneratorContext |
Manages reusable code fragments, member variables, and class-level declarations |
JavaCodeBuilder |
Type-safe builder for constructing Java source code with fluent API |
CompileUtils |
Compiles generated source code using Janino with LRU caching |
GeneratedClass<T> |
Wrapper holding generated source code and compiled class |
CodeGenException |
Exception type for code generation failures |
Type-Safe API
Modifierenum: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, ABSTRACT, SYNCHRONIZED, VOLATILE, TRANSIENTPrimitiveTypeenum: BOOLEAN, BYTE, CHAR, SHORT, INT, LONG, FLOAT, DOUBLE, VOIDParamclass: Type-safe method parameter representation- Helper methods:
mods(),params(),typeOf(),arrayOf()
Code Generators
| Generator | Output Interface | Description |
|---|---|---|
EqualiserCodeGenerator |
RecordEqualiser |
Generates code for comparing two InternalRow instances |
Supported Data Types
The EqualiserCodeGenerator supports all Fluss data types:
- Primitive types: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE
- String types: CHAR, VARCHAR, STRING
- Binary types: BINARY, VARBINARY, BYTES
- Temporal types: DATE, TIME, TIMESTAMP, TIMESTAMP_LTZ
- Numeric types: DECIMAL (with precision/scale)
- Complex types: ARRAY, MAP, ROW (nested)
Features
- Field projection support for partial row comparison
- Compiled class caching with configurable cache size
- Janino dependency shaded to
org.apache.fluss.shaded.org.codehaus.janinoto avoid classpath conflicts - Comprehensive Javadoc and package-info documentation
Anything else?
No response
Willingness to contribute
- I'm willing to submit a PR!
Metadata
Metadata
Assignees
Labels
No labels