Skip to content

[MicroBenchmarks] Add benchmark for control-flow-vectorization.#345

Open
ElvisWang123 wants to merge 3 commits intollvm:mainfrom
ElvisWang123:control-flow-vectorization
Open

[MicroBenchmarks] Add benchmark for control-flow-vectorization.#345
ElvisWang123 wants to merge 3 commits intollvm:mainfrom
ElvisWang123:control-flow-vectorization

Conversation

@ElvisWang123
Copy link
Copy Markdown

@ElvisWang123 ElvisWang123 commented Feb 23, 2026

Included benchmarks with conditional loops to trigger control-flow vectorization.
These cases can be used for measuring the performance impact of control-flow vectorization across targets.

DEF_COND_INC_LOOP(cond_inc_stride_128, 128)

// Conditional increment by value (sparse condition).
DEF_COND_INC_VALUE_LOOP(cond_inc_by_value, 42)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going to be a % of active lanes here ? Is it really worth to be tracked ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed small-stride cases to focus on larger strides.
This allows for entirely inactive lanes (in most of the cases), which helps testing conditional vector block optimizations(control-flow vectorization) across different targets.

// Define conditional increment loop with given stride.
#define DEF_COND_INC_LOOP(name, stride) \
template <typename T> \
__attribute__((noinline)) static void run_##name##_autovec(T *A, \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of this benchmark ? Just track current state of cf vectorization of novec and autovec or help to identify better LMUL to vectorize the loop ? If latter, it does make sense to add similar functions with forced vectorization for default LMUL and specified LMULs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this benchmark serves as a test suite for other targets to measure the performance impact of enabling control-flow vectorization.

I've updated the PR description to make it more clear.

Benchmarks with vs. without autovec with control flow inside for loops
with conditional codes.
@ElvisWang123 ElvisWang123 force-pushed the control-flow-vectorization branch from a493434 to 163936e Compare March 23, 2026 05:30
@ElvisWang123
Copy link
Copy Markdown
Author

Gentle ping :) @fhahn


#define ITERATIONS 100000

template <typename T> using CFVFunc = void (*)(T *, unsigned);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to use a more descriptive name or add a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to ControlFlowLoopFunc. Thanks!

__attribute__((noinline)) static void run_##name##_autovec(T *A, \
unsigned N) { \
for (unsigned i = 0; i < N; i++) { \
if (i % stride == 0) { \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stride may be a bit mis-leading here, maybe to be confused with the stride by which pointers increment? stride here effectively controls how frequently the condition executes, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Update the naming to the branch_every_N. Thanks!

// Define loops with different strides.
// Stride 16 usually big enough to accross single vector which can test if
// control-flow-vectorization is profitable on these loops.
DEF_COND_INC_LOOP(cond_inc_stride_16, 16)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variant roughly executes the conditional code every 16 iterations, right?

Would be good to also add variations where conditional code executes more frequently, including extreme case (every iteration, every other iteration)?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Add the cases that branch will taken for every (1, 2, 4, 8) iters. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants