Skip to content

Conversation

@LaurenzV
Copy link
Collaborator

This adds a couple of more features required for the SIMD version of vello_cpu, in particular:

  • Support for the trunc instruction.
  • Support for the wrapping_sub instruction.
  • Supported for interleaved loading in 128-bit blocks.

@ajakubowicz-canva ajakubowicz-canva changed the title Add more features Add trunc, wrapping_sub, and interleaved loading. Jun 26, 2025
@ajakubowicz-canva ajakubowicz-canva changed the title Add trunc, wrapping_sub, and interleaved loading. Add trunc, wrapping_sub, and interleaved loading Jun 26, 2025
Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I am happy with trunc and interleaved loading. However, I think wrapping_sub may be something we don't need to explicitly add. I assume that a followup PR would then need to implement wrapping_* variants of the other math operations.

Maybe it may be useful for @raphlinus to weigh in. Using Highway's design philosophy, they do not separate sub from wrapping_sub. excerpt:

  1. Ensuring signed integer overflow has well-defined semantics (wraparound).
    Link

My question is - do we want to provide explicitly different API's for wrapping and non wrapping operations?

Great work!

Edit: I also want to add a caveat that I'm not even close to a reliably source here, hence why I defer to Raph. I won't block review though. I'll try get a chance to review tomorrow.

@LaurenzV
Copy link
Collaborator Author

LaurenzV commented Jun 26, 2025

I'm just not sure that using wrapping_sub in all instances where you have a simple subtraction for fallback might lead to performance issues on some platforms where there isn't an explicit instruction for that. But if highway does it, it's probably fine? After all, the fallback version is usually not expected to be very performant.

Other than that, for me it would be fine to merge the two.

Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick note that I've reviewed trunc and wrapping_sub and that work LGTM.

I just have interleaved to review but ran out of time. If there's urgency, you could split out those instructions and land incrementally. Otherwise I'll complete review after weekend!

Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow review.

I'm happy with trunc and interleaved but on further thought I don't think we should have sub and wrapping_sub. I think there's a high chance if we add wrapping_sub in this PR, we'll want to remove it in the future (or add more explicit subtraction variants).

@ajakubowicz-canva ajakubowicz-canva changed the title Add trunc, wrapping_sub, and interleaved loading Add trunc and interleaved loading in neon and fallback Jun 30, 2025
Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :D

Great PR!

@LaurenzV LaurenzV merged commit 75b838e into main Jun 30, 2025
10 checks passed
@LaurenzV LaurenzV deleted the more_features_2 branch June 30, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants