-
Notifications
You must be signed in to change notification settings - Fork 22
Add more features #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ajakubowicz-canva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So many great improvements in this PR!!! 🎉 I haven't gotten through everything but from my first pass I think the comments I've left are the major points. Anything else will probably be a nit or can be done in a followup.
| ("min", OpSig::Binary), | ||
| ("min_precise", OpSig::Binary), | ||
| ("madd", OpSig::Ternary), | ||
| ("msub", OpSig::Ternary), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add msub as this should be something that LLVM can optimize into.
Excerpt from a DM to Raph:
As a guiding principle, I feel like anything that can be accessed through llvm optimizations doesn't need an intrinsic API method [...] and in some cases the right thing to do might be to file issues against llvm to get more optimizations. ~@raphlinus (via DM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so? Isn't this the same as for fused multiply-add, where the semantics are not exactly the same as for add + mul, and thus the compiler cannot make this optimization by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need Raph to chime in to answer your question because I'm also unclear about why madd isn't a similar consideration. I won't block on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed in office hours. I am going to try my best to summarize the discussion. Essentially, we already have a fused multiply-add, and I believe the intent is that a consumer of fearless_simd can express a fused multiply-sub by negating the fused multiply-add. Then LLVM should be able to optimize the negated fused multiply-add into the right underlying SIMD intrinsic.
This is preferred as we can keep our API surface smaller. It's also preferred because there's a complexity of which underling implementations to use on different architectures, so it's much simpler to leverage LLVM in these cases.
Finally we discussed removing msub and replacing usages with negeted fused multiply-adds.
@LaurenzV Does that summary capture the office hours discussion? Also, than you for bearing with me on this one 😄
Just clarifying that this relates to the Vello test suite right? |
Yep. |
11341f6 to
4a90a45
Compare
|
Just needs a fmt to pass CI. |
ajakubowicz-canva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After CI green LGTM.
This PR adds a bunch of additional features which already allow us to implement
in vello_cpu, using exclusively fearless_simd. The changes include:
fractandmsubmethodsTest cases pass with both, fallback and NEON.