-
Notifications
You must be signed in to change notification settings - Fork 1.1k
WIP: Improve render pass synchronization #5455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
doitsujin
wants to merge
17
commits into
master
Choose a base branch
from
unsynchronized-renderpass
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Only affects hardware that can't use unified layouts. We no longer avoid transitions to TRANSFER_* layouts anyway, so there is no good reason to avoid SHADER_READ_ONLY_OPTIMAL.
Previously, access to images with multiple mips could be oversynchronized in some cases. Fix this by computing the proper ending address of the last subresource accessed.
Fixes feedback loops on unified layouts and allows RDNA2 to hit the happy path in more cases.
The idea here is to allow small full-screen render passes to overlap with unrelated work, such as copies, compute shaders and even other render passes that do not access the same set of resources. In some games and on some hardware, this can improve performance somewhat significantly. This commit only implements the synchronization and barrier tracking part for unsynchronized passes, heuristics to enable the feature will be added separately.
Should make RGP captures more readable by default.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #5453 and I still need to figure out a way to not destroy tilers with potentially suboptimal barriers, hence the draft.
TL;DR this allows post-processing render passes to overlap with each other and with compute work, as long as there are no data hazards:

Previously, we'd issue full barriers around render passes to avoid having to track all resources used during rendering for hazard detection, but that leads to a lot of over-synchronization (and thus, bad GPU utilization) in some games. I'm seeing around a ~2% improvement in Dirt Rally 2 and Monster Hunter World on both my 6900XT and RTX 4070.
Downside is that this costs some CPU cycles, but there's a heuristic in place that should filter out expensive render passes so it shouldn't be too bad.