You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #985 (comment), exp is very slow to compile for larger StaticArrays. This tries to reduce the latency a bit by just pulling out code into smaller functions, aiming to reduce the amount of inlined code.
There's little difference in the latency at small matrix sizes - and a slight reduction in performance for 3×3 matrices. For all other sizes, this PR actually produces a clear a performance improvement however. The latency improvements at larger sizes are also pronounced. Still, 3×3 matrices are definitely more important than bigger matrices, so it's not a no-brainer I guess - but I wanted to put the PR up for discussion anyway.
The summary of timings before/after this PR are (the performance timings depend on which branch is taken; also, this is on v1.7.2):
I wonder how much compilation time reduction is just due to not duplicating U = A*U in each if branch? Maybe we don't have to decrease performance for the 3x3 case. Also, we could use the old variant for size 3x3 and the new one for larger matrices.
I tested various permutations of this quite extensively yesterday (pretty sure I also tested the A*U shuffle).
Puzzlingly, I found that almost all the compilation time comes from the presence of the if and for statements that rescale A in the second branch: without those, compilation time is drastically reduced. I don't understand why they make such a difference.
Alternatively, we could also use the new in-function @inline in 1.8 to force inlining in the 3x3 case (to avoid code duplication). Just having the code duplicated would indeed also be a solution.
almost all the compilation time comes from the presence of the two loops that rescale A in the second branch: without those, compilation time is drastically reduced
Interesting, though confusing! Did you try moving just that part out into a separate function?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As discussed in #985 (comment),
expis very slow to compile for largerStaticArrays. This tries to reduce the latency a bit by just pulling out code into smaller functions, aiming to reduce the amount of inlined code.There's little difference in the latency at small matrix sizes - and a slight reduction in performance for 3×3 matrices. For all other sizes, this PR actually produces a clear a performance improvement however. The latency improvements at larger sizes are also pronounced. Still, 3×3 matrices are definitely more important than bigger matrices, so it's not a no-brainer I guess - but I wanted to put the PR up for discussion anyway.
The summary of timings before/after this PR are (the performance timings depend on which branch is taken; also, this is on v1.7.2):
@time(1st)@btime(branch 1)@btime(branch 2)