Improve tuning for splitk #4486

pfultz2 · 2025-12-05T22:38:31Z

Motivation

Technical Details

This adds the memory_coloring pass to remove any memory allocations. It also uses a bundle of 10 to get better result due to overhead of multiple kernels.

Changelog Category

- Added: New functionality.
- Changed: Changes to existing functionality.
- Removed: Functionality or support that has been removed. (Compared to a previous release)
- Optimized: Component performance that has been optimized or improved.
- Resolved Issues: Known issues from a previous version that have been resolved.
- Not Applicable: This PR is not to be included in the changelog.

Copilot

Pull request overview

This PR optimizes the benchmarking process for GPU kernel tuning (particularly for splitk operations) by improving the compilation and timing of benchmark kernels. The changes add memory optimization passes and adjust timing parameters to get more accurate performance measurements.

Key Changes:

Added memory_coloring pass to eliminate redundant memory allocations during benchmarking
Increased benchmark bundle size from 1 to 10 to better amortize kernel launch overhead
Added eliminate_identity pass and add_return call to properly structure benchmark modules

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

umangyadav · 2025-12-06T00:48:59Z

Would it possible for you try this PR with ROCm/rocMLIR#2156 in CI ?

codecov · 2025-12-10T23:49:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4486   +/-   ##
========================================
  Coverage    92.21%   92.21%           
========================================
  Files          561      561           
  Lines        27228    27228           
========================================
  Hits         25108    25108           
  Misses        2120     2120

see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pfultz2 added 2 commits December 5, 2025 16:18

Improve tuning for splitk

c79a500

Format

2fa7356

pfultz2 requested a review from causten as a code owner December 5, 2025 22:38

Use 10

02f87ba

causten requested a review from Copilot December 5, 2025 22:53

Copilot started reviewing on behalf of causten December 5, 2025 22:54 View session

Copilot AI reviewed Dec 5, 2025

View reviewed changes

pfultz2 added 7 commits December 9, 2025 17:01

Fix splitting with aliased inputs

17af409

Format

93bd9f6

Fuse add/mul as well

f1404c1

Remove comments

cdc8431

Format

6f40fa5

Format

11fa6c3

License

dc732f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve tuning for splitk #4486

Improve tuning for splitk #4486

pfultz2 commented Dec 5, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

umangyadav commented Dec 6, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve tuning for splitk #4486

Are you sure you want to change the base?

Improve tuning for splitk #4486

Conversation

pfultz2 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Changelog Category

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

umangyadav commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 10, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pfultz2 commented Dec 5, 2025 •

edited

Loading

umangyadav commented Dec 6, 2025 •

edited

Loading