[MicroBenchmarks] Add benchmark for control-flow-vectorization.#345
[MicroBenchmarks] Add benchmark for control-flow-vectorization.#345ElvisWang123 wants to merge 3 commits intollvm:mainfrom
Conversation
800ee3d to
b224f56
Compare
| DEF_COND_INC_LOOP(cond_inc_stride_128, 128) | ||
|
|
||
| // Conditional increment by value (sparse condition). | ||
| DEF_COND_INC_VALUE_LOOP(cond_inc_by_value, 42) |
There was a problem hiding this comment.
what's going to be a % of active lanes here ? Is it really worth to be tracked ?
There was a problem hiding this comment.
Removed small-stride cases to focus on larger strides.
This allows for entirely inactive lanes (in most of the cases), which helps testing conditional vector block optimizations(control-flow vectorization) across different targets.
| // Define conditional increment loop with given stride. | ||
| #define DEF_COND_INC_LOOP(name, stride) \ | ||
| template <typename T> \ | ||
| __attribute__((noinline)) static void run_##name##_autovec(T *A, \ |
There was a problem hiding this comment.
What is the meaning of this benchmark ? Just track current state of cf vectorization of novec and autovec or help to identify better LMUL to vectorize the loop ? If latter, it does make sense to add similar functions with forced vectorization for default LMUL and specified LMULs
There was a problem hiding this comment.
IIUC, this benchmark serves as a test suite for other targets to measure the performance impact of enabling control-flow vectorization.
I've updated the PR description to make it more clear.
Benchmarks with vs. without autovec with control flow inside for loops with conditional codes.
a493434 to
163936e
Compare
|
Gentle ping :) @fhahn |
|
|
||
| #define ITERATIONS 100000 | ||
|
|
||
| template <typename T> using CFVFunc = void (*)(T *, unsigned); |
There was a problem hiding this comment.
might be good to use a more descriptive name or add a comment
There was a problem hiding this comment.
Update to ControlFlowLoopFunc. Thanks!
| __attribute__((noinline)) static void run_##name##_autovec(T *A, \ | ||
| unsigned N) { \ | ||
| for (unsigned i = 0; i < N; i++) { \ | ||
| if (i % stride == 0) { \ |
There was a problem hiding this comment.
Stride may be a bit mis-leading here, maybe to be confused with the stride by which pointers increment? stride here effectively controls how frequently the condition executes, right?
There was a problem hiding this comment.
Yes. Update the naming to the branch_every_N. Thanks!
| // Define loops with different strides. | ||
| // Stride 16 usually big enough to accross single vector which can test if | ||
| // control-flow-vectorization is profitable on these loops. | ||
| DEF_COND_INC_LOOP(cond_inc_stride_16, 16) |
There was a problem hiding this comment.
This variant roughly executes the conditional code every 16 iterations, right?
Would be good to also add variations where conditional code executes more frequently, including extreme case (every iteration, every other iteration)?
There was a problem hiding this comment.
Yes.
Add the cases that branch will taken for every (1, 2, 4, 8) iters. Thanks!
Included benchmarks with conditional loops to trigger control-flow vectorization.
These cases can be used for measuring the performance impact of control-flow vectorization across targets.