Add General Test for Taskflow Dialect #233

ShangkunLi · 2026-01-09T13:08:31Z

I am trying to implement the ConvertAffineToTaskflow pass this week. The problem that I encountered these days is that we cannot exhaust all the affine structures in this pass. I have already written a 1.7K-line conversion pass to convert the following two cases (multi-nested and irregular-loop) successfully.

But when I try to add a new case, the conversion pass cannot process that structure, and more specific handling code needs to be added.

For linalg dialect, it’s a good idea to implement such a pass as there are only data dependencies between different tasks. However, for affine.for (especially for imperfect-nested loops), the nested structures are too hard for us to analyze.

So in this pr, I just put the transformed ir of multi-nested and irregular-loop in the tests. Just to make sure that the defined dialect is correct and can represent such structures.

More discussions are needed for converting from high-level representations.

tancheng

Can you show a unsupported example? I thought we anyway can represent any example with a naive way (w/o analyzing dependency, i.e., assume all data are dependent).

tancheng · 2026-01-09T19:30:59Z

test/multi-cgra/taskflow/irregular-loop/irregular-loop-taskflow.mlir

@@ -0,0 +1,73 @@
+// RUN: mlir-neura-opt %s | FileCheck %s


How is the test/multi-cgra/taskflow/irregular-loop/irregular-loop.cpp compiled using lit?

I just use the mlir-neura-opt to parse the input ir, to make sure the syntax is correct.

ShangkunLi · 2026-01-10T04:50:58Z

Can you show a unsupported example? I thought we anyway can represent any example with a naive way (w/o analyzing dependency, i.e., assume all data are dependent).

Here is a blocked_gemm example.

module attributes {} {
  func.func @_Z6bbgemmPiS_S_(%arg0: memref<?xi32>, %arg1: memref<?xi32>, %arg2: memref<?xi32>) attributes {llvm.linkage = #llvm.linkage<external>} {
    // Task 1
    affine.for %arg3 = 0 to 8 {  // Loop 1
      affine.for %arg4 = 0 to 64 {  // Loop 2
        affine.for %arg5 = 0 to 64 step 8 {  // Loop 3
          affine.for %arg6 = 0 to 64 step 8 {  // Loop 4
            %0 = affine.load %arg0[%arg6 + %arg3 + %arg4 * 64] : memref<?xi32>
            // Task 2
            affine.for %arg7 = 0 to 8 {  // Loop 5
              %1 = affine.load %arg1[%arg6 * 64 + %arg5 + %arg7 + %arg3 * 64] : memref<?xi32>
              %2 = arith.muli %0, %1 : i32
              %3 = affine.load %arg2[%arg5 + %arg7 + %arg4 * 64] : memref<?xi32>
              %4 = arith.addi %3, %2 : i32
              affine.store %4, %arg2[%arg5 + %arg7 + %arg4 * 64] : memref<?xi32>
            }
          }
        }
      }
    }
    return
  }
}

The expected output is like:

//Task 1
%ctrl_out, %data_out = taskflow.task (<ins>){
    affine.for %arg3 = 0 to 8 {
      affine.for %arg4 = 0 to 64 {
        affine.for %arg5 = 0 to 64 step 8 {
          affine.for %arg6 = 0 to 64 step 8 {
            %0 = affine.load %arg0[%arg6 + %arg3 + %arg4 * 64] : memref<?xi32>
            taskflow.emit %arg6, %0
          }
    taskflow.yield
}

%ctrl = taskflow.drive %ctrl_out
%data = taskflow.channel %data_out
//Task 2
taskflow.task (%ctrl, %data, <other_ins>){
  affine.for %arg7 = 0 to 8 {
              %1 = affine.load %arg1[%arg6 * 64 + %arg5 + %arg7 + %arg3 * 64] : memref<?xi32>
              %2 = arith.muli %0, %1 : i32
              %3 = affine.load %arg2[%arg5 + %arg7 + %arg4 * 64] : memref<?xi32>
              %4 = arith.addi %3, %2 : i32
              affine.store %4, %arg2[%arg5 + %arg7 + %arg4 * 64] : memref<?xi32>
            }
}

Difficulties for automated conversion:

We need to automatically segment the master task (from loop 1-4) & the slave task (loop 5)
We need to insert the taskflow.emit operation properly in the master task to trigger the slave task
For tasks with RAW dependencies, we need to insert the taskflow.channel op to denote such dependencies (that's why I changed the traits of taskflow.task from NoMemoryEffect to IsolatedFromAbove to denote the RAW dependencies explicitly)

ShangkunLi · 2026-01-10T04:53:40Z

Can you show a unsupported example? I thought we anyway can represent any example with a naive way (w/o analyzing dependency, i.e., assume all data are dependent).

And I don't get what you mean by "assume all data are dependent"? In such a case, how can we guarantee the RAW dependency in a taskflow (task in a dataflow) manner?

tancheng · 2026-01-10T05:12:17Z

okay, let's discuss this later.

BTW, task1 would run on one CGRA and task2 would run on another CGRA? Or task1 is on a controller?

ShangkunLi · 2026-01-10T05:14:50Z

okay, let's discuss this later.

BTW, task1 would run on one CGRA and task2 would run on another CGRA? Or task1 is on a controller?

Task 1 will run on one CGRA and task 2 will run on another. The controller only handles the perfect nested part (like a counter).

tancheng · 2026-01-10T06:05:55Z

okay, let's discuss this later.
BTW, task1 would run on one CGRA and task2 would run on another CGRA? Or task1 is on a controller?

Task 1 will run on one CGRA and task 2 will run on another. The controller only handles the perfect nested part (like a counter).

So in this case, CGRA1 might be low utilized as it only perform load?

ShangkunLi · 2026-01-10T06:13:35Z

okay, let's discuss this later.
BTW, task1 would run on one CGRA and task2 would run on another CGRA? Or task1 is on a controller?

Task 1 will run on one CGRA and task 2 will run on another. The controller only handles the perfect nested part (like a counter).

So in this case, CGRA1 might be low utilized as it only perform load?

Correct. And an automated conversion could be extremely complex in such a case.

ShangkunLi added 2 commits January 9, 2026 20:07

add taskflow.emit op

5e3d07c

add more general affine tests

fb82c0e

ShangkunLi requested a review from guosran January 9, 2026 13:08

remove debug output

3924f2e

tancheng reviewed Jan 9, 2026

View reviewed changes

tancheng approved these changes Jan 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add General Test for Taskflow Dialect #233

Add General Test for Taskflow Dialect #233

ShangkunLi commented Jan 9, 2026 •

edited

Loading

Uh oh!

tancheng left a comment

Uh oh!

tancheng Jan 9, 2026

Uh oh!

ShangkunLi Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

tancheng commented Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

tancheng commented Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add General Test for Taskflow Dialect #233

Are you sure you want to change the base?

Add General Test for Taskflow Dialect #233

Conversation

ShangkunLi commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tancheng left a comment

Choose a reason for hiding this comment

Uh oh!

tancheng Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

ShangkunLi Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

tancheng commented Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

tancheng commented Jan 10, 2026

Uh oh!

ShangkunLi commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShangkunLi commented Jan 9, 2026 •

edited

Loading