Skip to content

[codegen] Add fluss-codegen module for runtime code generation #2398

@platinumhamburg

Description

@platinumhamburg

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Runtime code generation is essential for achieving high performance in data processing systems. Instead of using reflection or generic implementations, generated code can:

  1. Eliminate virtual method dispatch - Direct field access and type-specific operations
  2. Enable JIT optimization - Generated code can be better optimized by the JVM
  3. Reduce boxing/unboxing overhead - Type-specific code avoids primitive boxing
  4. Support complex type comparisons - Nested types (arrays, maps, rows) require specialized comparison logic

The initial use case is RecordEqualiser for comparing InternalRow instances, which is needed for:

  • Change data capture (CDC) deduplication
  • Aggregation state management
  • Primary key table updates

Solution

Core Framework

Component Description
CodeGeneratorContext Manages reusable code fragments, member variables, and class-level declarations
JavaCodeBuilder Type-safe builder for constructing Java source code with fluent API
CompileUtils Compiles generated source code using Janino with LRU caching
GeneratedClass<T> Wrapper holding generated source code and compiled class
CodeGenException Exception type for code generation failures

Type-Safe API

  • Modifier enum: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, ABSTRACT, SYNCHRONIZED, VOLATILE, TRANSIENT
  • PrimitiveType enum: BOOLEAN, BYTE, CHAR, SHORT, INT, LONG, FLOAT, DOUBLE, VOID
  • Param class: Type-safe method parameter representation
  • Helper methods: mods(), params(), typeOf(), arrayOf()

Code Generators

Generator Output Interface Description
EqualiserCodeGenerator RecordEqualiser Generates code for comparing two InternalRow instances

Supported Data Types

The EqualiserCodeGenerator supports all Fluss data types:

  • Primitive types: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE
  • String types: CHAR, VARCHAR, STRING
  • Binary types: BINARY, VARBINARY, BYTES
  • Temporal types: DATE, TIME, TIMESTAMP, TIMESTAMP_LTZ
  • Numeric types: DECIMAL (with precision/scale)
  • Complex types: ARRAY, MAP, ROW (nested)

Features

  • Field projection support for partial row comparison
  • Compiled class caching with configurable cache size
  • Janino dependency shaded to org.apache.fluss.shaded.org.codehaus.janino to avoid classpath conflicts
  • Comprehensive Javadoc and package-info documentation

Anything else?

No response

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions