Skip to content

Conversation

@LaurenzV
Copy link
Collaborator

This PR adds a bunch of additional features which already allow us to implement

  • Basic alpha compositing of fills/strips.
  • Rendering of blurred rectangles.
  • Rendering of gradients.

in vello_cpu, using exclusively fearless_simd. The changes include:

  • Widening and narrowing instructions for u8 and u16.
  • 512-bit vector types
  • I removed zip/unzip and instead added a zip1/zip2 instruction for zipping the upper/lower part of the vectors.
  • The fract and msub methods
  • The ability to reinterpret integer types as u8

Test cases pass with both, fallback and NEON.

Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many great improvements in this PR!!! 🎉 I haven't gotten through everything but from my first pass I think the comments I've left are the major points. Anything else will probably be a nit or can be done in a followup.

("min", OpSig::Binary),
("min_precise", OpSig::Binary),
("madd", OpSig::Ternary),
("msub", OpSig::Ternary),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to add msub as this should be something that LLVM can optimize into.

Excerpt from a DM to Raph:

As a guiding principle, I feel like anything that can be accessed through llvm optimizations doesn't need an intrinsic API method [...] and in some cases the right thing to do might be to file issues against llvm to get more optimizations. ~@raphlinus (via DM)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? Isn't this the same as for fused multiply-add, where the semantics are not exactly the same as for add + mul, and thus the compiler cannot make this optimization by default?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need Raph to chime in to answer your question because I'm also unclear about why madd isn't a similar consideration. I won't block on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed in office hours. I am going to try my best to summarize the discussion. Essentially, we already have a fused multiply-add, and I believe the intent is that a consumer of fearless_simd can express a fused multiply-sub by negating the fused multiply-add. Then LLVM should be able to optimize the negated fused multiply-add into the right underlying SIMD intrinsic.

This is preferred as we can keep our API surface smaller. It's also preferred because there's a complexity of which underling implementations to use on different architectures, so it's much simpler to leverage LLVM in these cases.

Finally we discussed removing msub and replacing usages with negeted fused multiply-adds.

@LaurenzV Does that summary capture the office hours discussion? Also, than you for bearing with me on this one 😄

@AndrewJakubowicz
Copy link

Test cases pass with both, fallback and NEON.

Just clarifying that this relates to the Vello test suite right?
Thank you!

@LaurenzV
Copy link
Collaborator Author

Just clarifying that this relates to the Vello test suite right?

Yep.

@ajakubowicz-canva
Copy link
Collaborator

Just needs a fmt to pass CI.

Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After CI green LGTM.

@LaurenzV LaurenzV merged commit b07fbff into main Jun 23, 2025
6 checks passed
@LaurenzV LaurenzV deleted the vello_cpu_part1 branch June 23, 2025 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants