Skip to content

Conversation

@ShangkunLi
Copy link
Collaborator

In this PR, an edge-based control flow to data flow transform pass is implemented.

Basically, we can categorize all the edges in CFG into the following 8 cases:

  1. Backward cond_br edges with values.
  2. Backward br edges with values.
  3. Backward cond_br without values.
  4. Backward br without value.
  5. Forward cond_br edges with values.
  6. Forward br edges with values.
  7. Forward cond_br edges without values.
  8. Forward br edges without values.

Cases 3 and 4 do not appear in the current benchmarks. Since they correspond to control flow jumps for statements like goto, we do not consider these cases for now.

The transform is implemented based on the remaining six edges.

For the target block of the edge in case 7, we chose to grant_predicate all results in this block based on condition to ensure correctness. For example:

target block bb2:

^bb1(%4: !neura.data<i64, i1>):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (!neura.data<i64, i1>) -> !neura.data<index, i1>
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<i1, i1>
    neura.cond_br %6 : !neura.data<i1, i1> then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (!neura.data<index, i1>) -> !neura.data<i64, i1>
    neura.br %7 : !neura.data<i64, i1> to ^bb3

transformed ir:

%11 = "neura.icmp"(%10, %3) <{cmpType = "slt"}> : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<i1, i1>
%12 = "neura.not"(%11) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%13 = "neura.cast"(%5) <{cast_type = "index_to_int"}> : (!neura.data<index, i1>) -> !neura.data<i64, i1>
%14 = neura.grant_predicate %13, %11 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>

We grant_predicate the result of bb2 -- %13 with the condition of its pred block.

@ShangkunLi ShangkunLi marked this pull request as ready for review June 29, 2025 15:14
@ShangkunLi ShangkunLi requested a review from tancheng June 29, 2025 15:14
@tancheng
Copy link
Contributor

Excellent summarization @ShangkunLi.

For the target block of the edge in case 7, we chose to grant_predicate all results in this block based on condition to ensure correctness.

Shouldn't we grant_predicate all live-out results of bb1? Your example shows the bb2's live-out is predicated based on bb1's condition. This is some kind of indirect? WDYT?

@tancheng
Copy link
Contributor

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

@ShangkunLi
Copy link
Collaborator Author

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like %2 in bb2, %2 and %0 in bb4. So do we need to grant_predicate such grant_always values? Or just grant_predicat each result of operations in such block?

func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{value = 1 : index}> : () -> index
    %1 = "neura.constant"() <{value = 128 : index}> : () -> index
    %2 = "neura.constant"() <{value = 0 : index}> : () -> index
    %3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %3 : i64 to ^bb1
^bb1(%4: i64):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %7 : i64 to ^bb3
^bb3(%8: i64):  // 2 preds: ^bb2, ^bb4
    %9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
    %10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4:  // pred: ^bb3
    %11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
    %12 = "neura.add"(%9, %0) : (index, index) -> index
    %13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %13 : i64 to ^bb3
^bb5:  // pred: ^bb3
    %14 = "neura.add"(%5, %0) : (index, index) -> index
    %15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %15 : i64 to ^bb1
^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
}

@tancheng
Copy link
Contributor

tancheng commented Jun 29, 2025

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like %2 in bb2, %2 and %0 in bb4. So do we need to grant_predicate such grant_always values? Or just grant_predicat each result of operations in such block?

func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{value = 1 : index}> : () -> index
    %1 = "neura.constant"() <{value = 128 : index}> : () -> index
    %2 = "neura.constant"() <{value = 0 : index}> : () -> index
    %3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %3 : i64 to ^bb1
^bb1(%4: i64):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %7 : i64 to ^bb3
^bb3(%8: i64):  // 2 preds: ^bb2, ^bb4
    %9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
    %10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4:  // pred: ^bb3
    %11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
    %12 = "neura.add"(%9, %0) : (index, index) -> index
    %13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %13 : i64 to ^bb3
^bb5:  // pred: ^bb3
    %14 = "neura.add"(%5, %0) : (index, index) -> index
    %15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %15 : i64 to ^bb1
^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
}

I believe we can grant_predicate further on the already grant_always, i.e., all the live-in of bb2, rather than the result of bb2.

  • I think the rule is to (use bb1 with cond_br as an example) grant_predicate all bb1's live-out and all true-successor bb's (i.e., bb2) live-ins, WDYT? (and grant_predicate(NOT) on false-successor, i.e., bb6, though nothing to predicate for it as it has no argument in this IR example)
  • If one const/grant_always is used by multiple BBs being dominated by different conditions, we can grant_predicate them based on the specific condition
  • For now, we blindly give grant_always for all the const inside the entry block, we later can fuse the grant_always -> grant_predicate

How does this sound?

@ShangkunLi
Copy link
Collaborator Author

Ah I got it. bb2 is dominated by bb1 but the its live-in is from another block. In such case, shouldn't we grant_predicate to bb2's live-in?

You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like %2 in bb2, %2 and %0 in bb4. So do we need to grant_predicate such grant_always values? Or just grant_predicat each result of operations in such block?

func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
    %0 = "neura.constant"() <{value = 1 : index}> : () -> index
    %1 = "neura.constant"() <{value = 128 : index}> : () -> index
    %2 = "neura.constant"() <{value = 0 : index}> : () -> index
    %3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %3 : i64 to ^bb1
^bb1(%4: i64):  // 2 preds: ^bb0, ^bb5
    %5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
    %6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2:  // pred: ^bb1
    %7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %7 : i64 to ^bb3
^bb3(%8: i64):  // 2 preds: ^bb2, ^bb4
    %9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
    %10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
    neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4:  // pred: ^bb3
    %11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
    neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
    %12 = "neura.add"(%9, %0) : (index, index) -> index
    %13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %13 : i64 to ^bb3
^bb5:  // pred: ^bb3
    %14 = "neura.add"(%5, %0) : (index, index) -> index
    %15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
    neura.br %15 : i64 to ^bb1
^bb6:  // pred: ^bb1
    "neura.return"() : () -> ()
}

I believe we can grant_predicate further on the already grant_always, i.e., all the live-in of bb2, rather than the result of bb2.

  • I think the rule is to (use bb1 with cond_br as an example) grant_predicate all bb1's live-out and all true-successor bb's live-in, WDYT?
  • If one const/grant_always is used by multiple BBs being dominated by different conditions, we can grant_predicate them based on the specific condition
  • For now, we blindly give grant_always for all the const inside the entry block, we later can fuse the grant_always -> grant_predicate

How does this sound?

Sure, this sounds more robust!

@ShangkunLi ShangkunLi closed this Jun 29, 2025
@ShangkunLi ShangkunLi reopened this Jun 29, 2025
@ShangkunLi
Copy link
Collaborator Author

Fix the transform logic for forward cond_br edges without values.

@ShangkunLi ShangkunLi force-pushed the refactor-ctrl2data branch from 1ed1e8c to 47ca8f9 Compare July 1, 2025 11:23
@ShangkunLi ShangkunLi merged commit 1d30fcf into coredac:main Jul 1, 2025
1 check passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShangkunLi, I just noticed you disabled these tests, can you plz help restore them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will enable in the next pr. I am coding on the ctrl-flow fusion now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, the --insert-data-mov can work and can generate correct intermediate ir, but the --map-to-accelerator doesn't work now.... I may try to solve this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigation.

  • We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up
  • We prefer "tiny" PRs. One PR could only target single functionality or function, it might touch a lot of tests, which is fine, but the funtionality-wise, it should be "tiny"
    • If it is a large project, you could make a chain of branches, make PRs gradually

--insert-data-mov can work and can generate correct intermediate ir

You mean the intermediate ir is exactly the same with or without your changes (this PR or your current implementation)?

but the --map-to-accelerator doesn't work now

Hanging or crash or other issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up

Got it! Sorry for the problem caused by my unorthodox development process.

For the intermediate ir generated by the --insert-data-mov now is

func.func @loop_test() -> f32 attributes {accelerator = "neura"} {
    %0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1>
    %1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1>
    %4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1>
    %7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1>
    %10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1>
    %13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %15 = neura.reserve : !neura.data<f32, i1>
    %16 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %17 = "neura.phi"(%15, %16) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
    %18 = neura.reserve : !neura.data<i64, i1>
    %19 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %20 = "neura.phi"(%18, %19) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
    %21 = "neura.data_mov"(%17) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
    %24 = "neura.data_mov"(%20) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
    %27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1>
    %30 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
    %31 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
    %32 = neura.grant_predicate %30, %31 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
    neura.ctrl_mov %32 -> %18 : !neura.data<i64, i1> !neura.data<i64, i1>
    %33 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %34 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
    %35 = neura.grant_predicate %33, %34 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
    neura.ctrl_mov %35 -> %15 : !neura.data<f32, i1> !neura.data<f32, i1>
    %36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
    %37 = "neura.not"(%36) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
    %38 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    %39 = "neura.data_mov"(%37) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
    %40 = neura.grant_predicate %38, %39 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
    %41 = "neura.data_mov"(%40) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
    "neura.return"(%41) : (!neura.data<f32, i1>) -> ()
  }

compared with the former one

// MOV:      func.func @loop_test() -> f32 attributes {accelerator = "neura"} {
// MOV-NEXT:   %0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT:   %1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT:   %4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT:   %7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1>
// MOV-NEXT:   %10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1>
// MOV-NEXT:   %13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %15 = neura.reserve : !neura.data<i64, i1>
// MOV-NEXT:   %16 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %17 = "neura.phi"(%16, %15) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %18 = neura.reserve : !neura.data<f32, i1>
// MOV-NEXT:   %19 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %20 = "neura.phi"(%19, %18) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %21 = "neura.data_mov"(%20) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %24 = "neura.data_mov"(%17) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1>
// MOV-NEXT:   %30 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT:   %31 = "neura.not"(%30) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT:   %32 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %33 = "neura.data_mov"(%31) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT:   %34 = neura.grant_predicate %32, %33 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
// MOV-NEXT:   %35 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   %36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT:   %37 = neura.grant_predicate %35, %36 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
// MOV-NEXT:   neura.ctrl_mov %37 -> %18 : !neura.data<f32, i1> !neura.data<f32, i1>
// MOV-NEXT:   %38 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT:   %39 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT:   %40 = neura.grant_predicate %38, %39 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
// MOV-NEXT:   neura.ctrl_mov %40 -> %15 : !neura.data<i64, i1> !neura.data<i64, i1>
// MOV-NEXT:   %41 = "neura.data_mov"(%34) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT:   "neura.return"(%41) : (!neura.data<f32, i1>) -> ()
// MOV-NEXT: }

The difference is that the two phi operations and the corresponding reserve operations appear in different orders. But from ir's view, this does not affect the DFG dependencies.

And now, when I try to use the --map-to-accelerator, it will reach the maxII and exit without generating a legal mapping result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Sorry for the problem caused by my unorthodox development process.

No worry :-)

it will reach the maxII and exit without generating a legal mapping result.

It is probably due to unoptimized mapping strategy:

  • Only pick the lowest cost (highest award) tile to map and ignore all other candidate tiles:
    MappingLoc target_loc = sorted_locs.front();
    if (placeAndRoute(op, target_loc, mapping_state)) {
    llvm::errs() << "[DEBUG] Successfully scheduled op: " << *op
    << " at loc: " << target_loc.resource->getType()
    << "#" << target_loc.resource->getId()
    << " @t=" << target_loc.time_step << "\n";
    continue;
    } else {
    llvm::errs() << "[DEBUG] Failed to schedule op: " << *op << "; target loc: " << target_loc.resource->getType() << "#" << target_loc.resource->getId() << " @t=" << target_loc.time_step << "\n";
    }
    // TODO: Optimization -- backtrack a few times if failed to schedule the op.
    // https://github.com/coredac/dataflow/issues/59
    return false;
  • No backtracking, i.e., if that placement or route failed, it just return false for that specific II, without back track to another tile and retry placement and route

Are you able to make a clean PR to just try to make the backtrackable mapping work to restore this test? No need to be super sophisticated though, #59

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I will do it ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants