Skip to content

Conversation

@geoffreyclaude
Copy link
Collaborator

@geoffreyclaude geoffreyclaude commented Nov 26, 2025

Which issue does this PR close?

Closes #32 and #35.

Rationale for this change

See linked issues. Users have no visibility into DataFusion's planning pipeline, making it hard to identify optimization bottlenecks or understand which rules actually affect the plan.

What changes are included in this PR?

This PR instruments the entire query planning pipeline: analyzer, logical optimizer, physical plan creation, and physical optimizer.

Each phase gets a parent span (analyze_logical_plan, optimize_logical_plan, optimize_physical_plan) that groups individual rule executions. Rules that actually modify the plan are marked with a (modified) suffix in their span name, and the phase span records which rules were effective. For the logical optimizer, pass numbers are tracked since it can run multiple iterations.

The RuleInstrumentationOptions type controls verbosity: Disabled for no instrumentation, PhaseOnly for just the phase spans, or Full for phase spans plus individual rule spans. There's also an option to include unified diffs showing exactly what changed.

Expensive operations like plan formatting and diffing are skipped when spans are disabled (lazy instrumentation).

New macros (instrument_rules_with_info_spans!, etc.) make it easy to instrument a SessionState. When physical optimizer instrumentation is enabled, the query planner is automatically wrapped to trace physical plan creation.

Example Trace

image

Usage

use datafusion::execution::SessionStateBuilder;
use datafusion_tracing::{instrument_rules_with_info_spans, RuleInstrumentationOptions};

// Build a session state with default features
let session_state = SessionStateBuilder::new()
    .with_default_features()
    .build();

// Configure rule instrumentation (full instrumentation with plan diffs)
let options = RuleInstrumentationOptions::full().with_plan_diff();

// Instrument all planning phases at INFO level
let session_state = instrument_rules_with_info_spans!(
    options: options,
    state: session_state
);

// Create the session context
let ctx = SessionContext::new_with_state(session_state);

For finer control, use the builder:

let options = RuleInstrumentationOptions::builder()
    .analyzer()              // Full instrumentation for analyzer
    .optimizer()             // Full instrumentation for logical optimizer
    .physical_optimizer()    // Full instrumentation for physical optimizer
    .plan_diff()             // Include unified diffs when rules modify plans
    .build();

Or for less verbose traces, use phase-only instrumentation:

let options = RuleInstrumentationOptions::phase_only(); // No individual rule spans

Are these changes tested?

Yes, the integration test suite now enables rule instrumentation and all snapshot tests have been updated to capture the new spans.

Are there any user-facing changes?

New public APIs: RuleInstrumentationOptions, RuleInstrumentationOptionsBuilder, and the instrument_rules_with_*_spans! macros. No breaking changes.

@geoffreyclaude geoffreyclaude force-pushed the feat/instrument_plan_creation branch 5 times, most recently from 4230ce8 to 94b89d6 Compare November 27, 2025 10:19
@geoffreyclaude geoffreyclaude changed the base branch from main to branch-52 January 8, 2026 08:54
@geoffreyclaude geoffreyclaude force-pushed the branch-52 branch 2 times, most recently from c39af17 to 2f47af8 Compare January 8, 2026 10:01
@geoffreyclaude geoffreyclaude force-pushed the feat/instrument_plan_creation branch from 94b89d6 to bffc487 Compare January 8, 2026 10:08
@geoffreyclaude geoffreyclaude reopened this Jan 8, 2026
@geoffreyclaude geoffreyclaude marked this pull request as ready for review January 8, 2026 10:28
@geoffreyclaude geoffreyclaude force-pushed the feat/instrument_plan_creation branch from bffc487 to 5ea68e7 Compare January 8, 2026 12:37
Introduces TracingQueryPlanner and rule instrumentation for analyzer,
logical optimizer, and physical optimizer rules with tracing spans.

Phase spans group individual rule executions and capture plan diffs when
rules modify plans.
@geoffreyclaude geoffreyclaude force-pushed the feat/instrument_plan_creation branch from 5ea68e7 to 5280fa4 Compare January 8, 2026 12:58
@geoffreyclaude geoffreyclaude merged commit eaafcf0 into branch-52 Jan 8, 2026
4 checks passed
@geoffreyclaude geoffreyclaude deleted the feat/instrument_plan_creation branch January 8, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants