-
Notifications
You must be signed in to change notification settings - Fork 15
Refactor the Ctrl to Data Flow Implementation Logic #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Excellent summarization @ShangkunLi.
Shouldn't we grant_predicate all live-out results of |
|
Ah I got it. |
You mean grant_predicate all the live-in values used in the block? But in many benchmarks, most live-ins are granted always. Like |
I believe we can
How does this sound? |
Sure, this sounds more robust! |
|
Fix the transform logic for forward cond_br edges without values. |
1ed1e8c to
47ca8f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ShangkunLi, I just noticed you disabled these tests, can you plz help restore them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will enable in the next pr. I am coding on the ctrl-flow fusion now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, the --insert-data-mov can work and can generate correct intermediate ir, but the --map-to-accelerator doesn't work now.... I may try to solve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigation.
- We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up
- We prefer "tiny" PRs. One PR could only target single functionality or function, it might touch a lot of tests, which is fine, but the funtionality-wise, it should be "tiny"
- If it is a large project, you could make a chain of branches, make PRs gradually
--insert-data-mov can work and can generate correct intermediate ir
You mean the intermediate ir is exactly the same with or without your changes (this PR or your current implementation)?
but the --map-to-accelerator doesn't work now
Hanging or crash or other issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We'd better avoid disabling tests, especially when there are more than one contributor, and some others are using the repo to ramp up
Got it! Sorry for the problem caused by my unorthodox development process.
For the intermediate ir generated by the --insert-data-mov now is
func.func @loop_test() -> f32 attributes {accelerator = "neura"} {
%0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1>
%1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1>
%4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1>
%7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1>
%10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1>
%13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%15 = neura.reserve : !neura.data<f32, i1>
%16 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%17 = "neura.phi"(%15, %16) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
%18 = neura.reserve : !neura.data<i64, i1>
%19 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%20 = "neura.phi"(%18, %19) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
%21 = "neura.data_mov"(%17) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
%24 = "neura.data_mov"(%20) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
%27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1>
%30 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
%31 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%32 = neura.grant_predicate %30, %31 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
neura.ctrl_mov %32 -> %18 : !neura.data<i64, i1> !neura.data<i64, i1>
%33 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%34 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%35 = neura.grant_predicate %33, %34 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
neura.ctrl_mov %35 -> %15 : !neura.data<f32, i1> !neura.data<f32, i1>
%36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%37 = "neura.not"(%36) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%38 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
%39 = "neura.data_mov"(%37) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%40 = neura.grant_predicate %38, %39 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
%41 = "neura.data_mov"(%40) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
"neura.return"(%41) : (!neura.data<f32, i1>) -> ()
}
compared with the former one
// MOV: func.func @loop_test() -> f32 attributes {accelerator = "neura"} {
// MOV-NEXT: %0 = "neura.constant"() <{predicate = true, value = 10 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT: %1 = "neura.data_mov"(%0) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %2 = "neura.grant_always"(%1) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %3 = "neura.constant"() <{predicate = true, value = 0 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT: %4 = "neura.data_mov"(%3) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %5 = "neura.grant_once"(%4) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %6 = "neura.constant"() <{predicate = true, value = 1 : i64}> : () -> !neura.data<i64, i1>
// MOV-NEXT: %7 = "neura.data_mov"(%6) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %8 = "neura.grant_always"(%7) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %9 = "neura.constant"() <{predicate = true, value = 3.000000e+00 : f32}> : () -> !neura.data<f32, i1>
// MOV-NEXT: %10 = "neura.data_mov"(%9) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %11 = "neura.grant_always"(%10) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %12 = "neura.constant"() <{predicate = true, value = 0.000000e+00 : f32}> : () -> !neura.data<f32, i1>
// MOV-NEXT: %13 = "neura.data_mov"(%12) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %14 = "neura.grant_once"(%13) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %15 = neura.reserve : !neura.data<i64, i1>
// MOV-NEXT: %16 = "neura.data_mov"(%5) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %17 = "neura.phi"(%16, %15) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %18 = neura.reserve : !neura.data<f32, i1>
// MOV-NEXT: %19 = "neura.data_mov"(%14) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %20 = "neura.phi"(%19, %18) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %21 = "neura.data_mov"(%20) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %22 = "neura.data_mov"(%11) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %23 = "neura.fadd"(%21, %22) : (!neura.data<f32, i1>, !neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %24 = "neura.data_mov"(%17) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %25 = "neura.data_mov"(%8) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %26 = "neura.add"(%24, %25) : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %27 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %28 = "neura.data_mov"(%2) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %29 = "neura.icmp"(%27, %28) <{cmpType = "slt"}> : (!neura.data<i64, i1>, !neura.data<i64, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %30 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %31 = "neura.not"(%30) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %32 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %33 = "neura.data_mov"(%31) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %34 = neura.grant_predicate %32, %33 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
// MOV-NEXT: %35 = "neura.data_mov"(%23) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: %36 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %37 = neura.grant_predicate %35, %36 : !neura.data<f32, i1>, !neura.data<i1, i1> -> !neura.data<f32, i1>
// MOV-NEXT: neura.ctrl_mov %37 -> %18 : !neura.data<f32, i1> !neura.data<f32, i1>
// MOV-NEXT: %38 = "neura.data_mov"(%26) : (!neura.data<i64, i1>) -> !neura.data<i64, i1>
// MOV-NEXT: %39 = "neura.data_mov"(%29) : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
// MOV-NEXT: %40 = neura.grant_predicate %38, %39 : !neura.data<i64, i1>, !neura.data<i1, i1> -> !neura.data<i64, i1>
// MOV-NEXT: neura.ctrl_mov %40 -> %15 : !neura.data<i64, i1> !neura.data<i64, i1>
// MOV-NEXT: %41 = "neura.data_mov"(%34) : (!neura.data<f32, i1>) -> !neura.data<f32, i1>
// MOV-NEXT: "neura.return"(%41) : (!neura.data<f32, i1>) -> ()
// MOV-NEXT: }
The difference is that the two phi operations and the corresponding reserve operations appear in different orders. But from ir's view, this does not affect the DFG dependencies.
And now, when I try to use the --map-to-accelerator, it will reach the maxII and exit without generating a legal mapping result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Sorry for the problem caused by my unorthodox development process.
No worry :-)
it will reach the maxII and exit without generating a legal mapping result.
It is probably due to unoptimized mapping strategy:
- Only pick the lowest cost (highest award) tile to map and ignore all other candidate tiles:
dataflow/lib/NeuraDialect/Mapping/mapping_util.cpp
Lines 378 to 390 in c58754e
MappingLoc target_loc = sorted_locs.front(); if (placeAndRoute(op, target_loc, mapping_state)) { llvm::errs() << "[DEBUG] Successfully scheduled op: " << *op << " at loc: " << target_loc.resource->getType() << "#" << target_loc.resource->getId() << " @t=" << target_loc.time_step << "\n"; continue; } else { llvm::errs() << "[DEBUG] Failed to schedule op: " << *op << "; target loc: " << target_loc.resource->getType() << "#" << target_loc.resource->getId() << " @t=" << target_loc.time_step << "\n"; } // TODO: Optimization -- backtrack a few times if failed to schedule the op. // https://github.com/coredac/dataflow/issues/59 return false; - No backtracking, i.e., if that placement or route failed, it just return false for that specific II, without back track to another tile and retry placement and route
Are you able to make a clean PR to just try to make the backtrackable mapping work to restore this test? No need to be super sophisticated though, #59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I will do it ASAP.
In this PR, an edge-based control flow to data flow transform pass is implemented.
Basically, we can categorize all the edges in CFG into the following 8 cases:
Cases 3 and 4 do not appear in the current benchmarks. Since they correspond to control flow jumps for statements like
goto, we do not consider these cases for now.The transform is implemented based on the remaining six edges.
For the target block of the edge in case 7, we chose to
grant_predicateall results in this block based on condition to ensure correctness. For example:target block
bb2:transformed ir:
We
grant_predicatethe result ofbb2--%13with the condition of its pred block.