-
Notifications
You must be signed in to change notification settings - Fork 15
Enable nested loop ctrl2data flow transforms #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // operation. | ||
| Location loc = | ||
| block->empty() ? block->getParent()->getLoc() : block->front().getLoc(); | ||
| if (has_block_args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we are specializaing here using has_block_args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to handle those blocks that do not have block arguments, like bb4 in this example.
module {
func.func @_Z10bert_node1PA1_A1_A1_A1_A128_bPA1_A128_S1_(%arg0: memref<?x1x1x1x1x128xi8>, %arg1: memref<?x1x128x1x1x128xi8>) attributes {accelerator = "neura", llvm.linkage = #llvm.linkage<external>} {
%0 = "neura.constant"() <{value = 1 : index}> : () -> index
%1 = "neura.constant"() <{value = 128 : index}> : () -> index
%2 = "neura.constant"() <{value = 0 : index}> : () -> index
%3 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %3 : i64 to ^bb1
^bb1(%4: i64): // 2 preds: ^bb0, ^bb5
%5 = "neura.cast"(%4) <{cast_type = "int_to_index"}> : (i64) -> index
%6 = "neura.icmp"(%5, %1) <{cmpType = "slt"}> : (index, index) -> i1
neura.cond_br %6 : i1 then to ^bb2 else to ^bb6
^bb2: // pred: ^bb1
%7 = "neura.cast"(%2) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %7 : i64 to ^bb3
^bb3(%8: i64): // 2 preds: ^bb2, ^bb4
%9 = "neura.cast"(%8) <{cast_type = "int_to_index"}> : (i64) -> index
%10 = "neura.icmp"(%9, %1) <{cmpType = "slt"}> : (index, index) -> i1
neura.cond_br %10 : i1 then to ^bb4 else to ^bb5
^bb4: // pred: ^bb3
%11 = neura.load_indexed %arg0[%2, %2, %2, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x1x1x1x128xi8> : i8
neura.store_indexed %11 to %arg1[%2, %2, %5, %2, %2, %9 : index, index, index, index, index, index] memref<?x1x128x1x1x128xi8> : i8
%12 = "neura.add"(%9, %0) : (index, index) -> index
%13 = "neura.cast"(%12) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %13 : i64 to ^bb3
^bb5: // pred: ^bb3
%14 = "neura.add"(%5, %0) : (index, index) -> index
%15 = "neura.cast"(%14) <{cast_type = "index_to_int"}> : (index) -> i64
neura.br %15 : i64 to ^bb1
^bb6: // pred: ^bb1
"neura.return"() : () -> ()
}
}
The pred block of bb4 is bb3, and we can jump from bb3 to bb4 through the cond_br. So in this implementation, we grant predicate each result in bb4 with the cond of bb3 (i.e., %10). The transformed code looks like
%18 = "neura.icmp"(%17, %3) <{cmpType = "slt"}> : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<i1, i1>
%19 = "neura.not"(%18) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%20 = neura.load_indexed %arg0[%5, %5, %5, %5, %5, %17 : !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>] memref<?x1x1x1x1x128xi8> : !neura.data<i8, i1>
%21 = neura.grant_predicate %20, %18 : !neura.data<i8, i1>, !neura.data<i1, i1> -> !neura.data<i8, i1>
neura.store_indexed %21 to %arg1[%5, %5, %10, %5, %5, %17 : !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>, !neura.data<index, i1>] memref<?x1x128x1x1x128xi8> : !neura.data<i8, i1>
%22 = "neura.add"(%17, %1) : (!neura.data<index, i1>, !neura.data<index, i1>) -> !neura.data<index, i1>
%23 = neura.grant_predicate %22, %18 : !neura.data<index, i1>, !neura.data<i1, i1> -> !neura.data<index, i1>
%24 = "neura.cast"(%23) <{cast_type = "index_to_int"}> : (!neura.data<index, i1>) -> !neura.data<i64, i1>
%25 = neura.grant_predicate %24, %18 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Your test contains
store_indexed, which is derived from thebert_nodexx.mlir, right? We didn't have a test withstore_indexed(except those bert xxx). has_block_argsis robust/enough? what about a BB has block args and also has non-block-arg live-in?- Why there is
%23 = neura.grant_predicate %22->neura.cast(%23)? The dataflow within BB shouldn't need thatgrant_predicate, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I see the problem. Will fix it soon.
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional line is enough.
Can you use an example to explain this in the PR description? |
Sure! In the previous implementation, the transformed You can see that In the new implementation, the transformed ir looks like We grant predicate the result |
|
In this pr:
grant_predicateoperation. (specifically, in previous code, it adds thegrant_predicateoperation on thegrant_alwaysvalue).CMakeLists.txt