Implement ldiv!() for QRSparse#676
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #676 +/- ##
==========================================
- Coverage 84.19% 84.16% -0.03%
==========================================
Files 12 12
Lines 9313 9342 +29
==========================================
+ Hits 7841 7863 +22
- Misses 1472 1479 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The asymmetry between UMFPACK and SPQR here on thread safety is concerning, but otherwise LGTM. A 100 x 10 sparse matrix is incredibly small so I'm not sure the 15% hit here is really representative. |
8a57c3f to
51d90fc
Compare
|
Fair enough, added the lock back in 51d90fc. New benchmark: julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
Range (min … max): 1.188 μs … 10.171 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.326 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.453 μs ± 397.615 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▆▆▆█▇▅▅▄▃▄▃▃▂▃▂▂▂▂▂▁▂▁▁ ▁▁ ▂
██████████████████████████████████████▇▇█▇▇▇▇▇▆▇▆▆▇▆▅▆▅▅▅▆▅ █
1.19 μs Histogram: log(frequency) by time 2.95 μs <
Memory estimate: 96 bytes, allocs estimate: 2. |
|
I pushed a micro-optimization in f4bce58 to compute and store julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
Range (min … max): 1.275 μs … 4.650 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.285 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.340 μs ± 143.469 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄ ▅▁ ▁ ▂ ▁▄ ▃▂ ▁
█████▅█▆▅██▆██▇▇██▆▅▇▅▄▃▂▂▄▄▅▅▅▄▃▄▄▄▃▃▅▄▅▅▅▃▄▄▃▅▄▄▅▃▅▄▄▅▅▅▆ █
1.28 μs Histogram: log(frequency) by time 2.02 μs <
Memory estimate: 48 bytes, allocs estimate: 1.In general I feel this a better style anyway, people don't usually expect property access to have a cost. I can do the same for |
|
One more optimization in a03ea62, now we're allocation-free :) julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 11 evaluations per sample.
Range (min … max): 991.636 ns … 2.668 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.198 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.175 μs ± 142.443 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▇█▄ ▁
▃████▇▃▂▁▁▁▁▁▁▁▂▁▁▁▁▂▃▄▅▅▅▄▄▃▄▇███▇▆▆▅▄▃▂▂▁▁▁▂▂▁▁▂▂▃▃▃▃▂▂▂▂▂▂ ▃
992 ns Histogram: frequency by time 1.46 μs <
Memory estimate: 0 bytes, allocs estimate: 0.Following the suggestions in JuliaLang/julia#36313. Note: the downside is that |
This reduces allocations for ldiv!() (and in general).
a03ea62 to
c2584c0
Compare
|
BTW, is there any chance of getting this in for 1.13? |
I originally was going to implement a separate workspace like how FastLapackInterface.jl does it, but seeing as there's already a
QRSparsetype that kinda functions as a workspace already I decided it made more sense to re-use that. ReusingQRSparseis also consistent with the way thatUmfpackLUholds a workspace. UnlikeUmfpackLUI did not add a lock because for the example in theworkspace reusetest it added a ~15% overhead. Instead the docstring clearly warns that usingQRSparsewithldiv!()is not threadsafe and tells users to make an explicit copy.Preallocating
Wremoved most of the allocations, and taking a view ofF.Rremoved the other big one. Benchmarks:Fixes #242. Written with help from Claude 🤖