Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Feb 10, 2026

Which issue does this PR close?

Rationale for this change

Extended CI runs have been taking 2+ hours, with sqllogictests identified as a major contributor to the regression (thanks to @nuno-faria for identifying the contributor). This PR adjusts the extended workflow to separate building the sqllogictests test binary from executing it, allowing the run step to execute the already-built test executable directly.

This helps reduce workflow overhead from re-invoking cargo test for execution (and makes it easier to distinguish compilation time from test runtime when investigating performance regressions).

What changes are included in this PR?

  • In the sqllogictests job:

    • Build the sqllogictests test binary once using cargo test build and capture the produced executable path.
    • Run the captured test binary directly from datafusion/sqllogictest with --include-sqlite.
    • Fail fast with a clear error if the test executable cannot be located.

Are these changes tested?

Yes.

Before fix

image

source: without current PR

After fix

image

source: current PR

(These changes are CI-only; functional behavior of DataFusion is unchanged.)

Are there any user-facing changes?

No. This PR only updates CI workflow behavior.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Feb 10, 2026
- Set incremental to false for release-nonlto profile
- Upgrade optimization level to 3
- Retain debug information for flamegraphs by setting strip to false
@github-actions github-actions bot added the optimizer Optimizer rules label Feb 11, 2026
@kosiew kosiew force-pushed the extended-tests-20052 branch from fe8af19 to c5da7ca Compare February 11, 2026 02:17
@github-actions github-actions bot removed the optimizer Optimizer rules label Feb 11, 2026
@kosiew kosiew force-pushed the extended-tests-20052 branch from 990cf0e to c5da7ca Compare February 11, 2026 02:18
@github-actions github-actions bot added the optimizer Optimizer rules label Feb 11, 2026
@kosiew kosiew force-pushed the extended-tests-20052 branch from c5da7ca to 8d9bcb8 Compare February 11, 2026 02:22
@kosiew kosiew changed the title Enhance CI workflow and optimize release profile settings CI: build and run sqllogictests binary directly in extended workflow Feb 11, 2026
@github-actions github-actions bot removed the optimizer Optimizer rules label Feb 11, 2026
@kosiew kosiew marked this pull request as ready for review February 11, 2026 04:03
@kosiew kosiew requested a review from alamb February 11, 2026 05:55
@kosiew kosiew force-pushed the extended-tests-20052 branch from d6fa2ec to cd967ba Compare February 11, 2026 09:59
rust-version: stable
- name: Build sqllogictest binary
run: |
TEST_BIN=$(cargo test --features backtrace,parquet_encryption --profile release-nonlto --test sqllogictests --no-run --message-format=json | sed -n 's/.*"executable":"\([^"]*\)".*/\1/p' | head -n 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than cargo test another idea would be to use cargo build 🤔

Also, what is the reason to use --message-format=json?

Finally, what is the head command for? I didn't see any obvious reason in

https://github.com/apache/datafusion/actions/runs/21890300850/job/63194519053

Perhaps you could add a comment explaining what those commands are for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good suggestions. I agree cargo build is clearer here since this step only compiles the sqllogictests target and does not run it.

I used --message-format=json to capture the exact emitted test binary path because Cargo places test executables under target/.../deps with a hash suffix. The head -n 1 was a simple way to select the first extracted executable path from the stream.

I’ll simplify and document this more clearly in the workflow so the intent is explicit.

- name: Run sqllogictest
run: |
cargo test --features backtrace,parquet_encryption --profile release-nonlto --test sqllogictests -- --include-sqlite
(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the ( and )?
Also, why does it need to do cd when the current version doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cd is needed because sqllogictests resolves test data via relative paths (for example test_files/ and ../../datafusion-testing/data/ in datafusion/sqllogictest/bin/sqllogictests.rs), so running from repo root can point it at the wrong locations. I added it after an earlier version without cd errored with test_files/ not found.

The subshell (...) was only used to scope cd so it would not affect the subsequent cargo clean. It is not strictly necessary.

I’ll remove the subshell and switch to a cleaner step-level working-directory: datafusion/sqllogictest for the test execution step.

@alamb
Copy link
Contributor

alamb commented Feb 11, 2026

thanks for looking into this @kosiew

@github-actions github-actions bot added the optimizer Optimizer rules label Feb 12, 2026
@kosiew kosiew force-pushed the extended-tests-20052 branch from 014a28d to 882251a Compare February 12, 2026 06:07
@github-actions github-actions bot removed the optimizer Optimizer rules label Feb 12, 2026
@kosiew kosiew requested a review from alamb February 12, 2026 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extended tests now take over 2 hours to run

2 participants