Skip to content

Comments

Implement ldiv!() for QRSparse#676

Open
JamesWrigley wants to merge 2 commits intoJuliaSparse:mainfrom
JamesWrigley:spqr-ldiv
Open

Implement ldiv!() for QRSparse#676
JamesWrigley wants to merge 2 commits intoJuliaSparse:mainfrom
JamesWrigley:spqr-ldiv

Conversation

@JamesWrigley
Copy link
Contributor

I originally was going to implement a separate workspace like how FastLapackInterface.jl does it, but seeing as there's already a QRSparse type that kinda functions as a workspace already I decided it made more sense to re-use that. Reusing QRSparse is also consistent with the way that UmfpackLU holds a workspace. Unlike UmfpackLU I did not add a lock because for the example in the workspace reuse test it added a ~15% overhead. Instead the docstring clearly warns that using QRSparse with ldiv!() is not threadsafe and tells users to make an explicit copy.

Preallocating W removed most of the allocations, and taking a view of F.R removed the other big one. Benchmarks:

using SparseArrays, LinearAlgebra

m, n = 100, 10
nn = 100

A = sparse([1:n; rand(1:m, nn - n)], [1:n; rand(1:n, nn - n)], randn(nn), m, n)
F = qr(A)
b = randn(m)
x = zeros(n);

# Before
julia> @benchmark $F \ $b                                                                                                                                                                                                                      
BenchmarkTools.Trial: 10000 samples with 17 evaluations per sample.                                                                                                                                                                            
 Range (min … max):  995.294 ns … 203.601 μs  ┊ GC (min … max): 0.00% … 98.64%                                                                                                                                                                 
 Time  (median):       1.245 μs               ┊ GC (median):    0.00%                                                                                                                                                                          
 Time  (mean ± σ):     1.324 μs ±   3.433 μs  ┊ GC (mean ± σ):  6.15% ±  2.59%                                                                                                                                                                 
                                                                                                                                                                                                                                               
  ▁▅▅▃▁   ▁▂▃▃▂  ▄▄▆█▇▆▅▄▂▁  ▁▁▁▁▂▁                             ▂                                                                                                                                                                              
  ██████▇██████████████████▇███████▇▆▃▅▄▄▄▂▂▃▄▅▂▅▅▇▇▆▇██▇▇███▇▆ █                                                                                                                                                                              
  995 ns        Histogram: log(frequency) by time       1.81 μs <                                                                                                                                                                              

 Memory estimate: 2.14 KiB, allocs estimate: 14.

# After
julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 15 evaluations per sample.
 Range (min … max):  957.467 ns …   3.822 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.154 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.148 μs ± 128.500 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃             █  ▁  ▅                                         
  ▄█▂▂▁▁▁▁▁▁▁▁▁▁▂█▅▃█▃▂█▆▂▆▄▁▂▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  957 ns           Histogram: frequency by time         1.61 μs <

 Memory estimate: 96 bytes, allocs estimate: 2.

Fixes #242. Written with help from Claude 🤖

@codecov
Copy link

codecov bot commented Feb 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.16%. Comparing base (4500d86) to head (c2584c0).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #676      +/-   ##
==========================================
- Coverage   84.19%   84.16%   -0.03%     
==========================================
  Files          12       12              
  Lines        9313     9342      +29     
==========================================
+ Hits         7841     7863      +22     
- Misses       1472     1479       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rayegun
Copy link
Member

rayegun commented Feb 16, 2026

The asymmetry between UMFPACK and SPQR here on thread safety is concerning, but otherwise LGTM.

A 100 x 10 sparse matrix is incredibly small so I'm not sure the 15% hit here is really representative.

@JamesWrigley
Copy link
Contributor Author

Fair enough, added the lock back in 51d90fc. New benchmark:

julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
 Range (min … max):  1.188 μs …  10.171 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.326 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.453 μs ± 397.615 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▆▆█▇▅▅▄▃▄▃▃▂▃▂▂▂▂▂▁▂▁▁ ▁▁                                 ▂
  ██████████████████████████████████████▇▇█▇▇▇▇▇▆▇▆▆▇▆▅▆▅▅▅▆▅ █
  1.19 μs      Histogram: log(frequency) by time      2.95 μs <

 Memory estimate: 96 bytes, allocs estimate: 2.

@JamesWrigley
Copy link
Contributor Author

I pushed a micro-optimization in f4bce58 to compute and store QRSparse.Q once. That removes another allocation:

julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
 Range (min  max):  1.275 μs    4.650 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.285 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.340 μs ± 143.469 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄ ▅▁ ▁  ▂  ▁▄  ▃▂                                          ▁
  █████▅█▆▅██▆██▇▇██▆▅▇▅▄▃▂▂▄▄▅▅▅▄▃▄▄▄▃▃▅▄▅▅▅▃▄▄▃▅▄▄▅▃▅▄▄▅▅▅▆ █
  1.28 μs      Histogram: log(frequency) by time      2.02 μs <

 Memory estimate: 48 bytes, allocs estimate: 1.

In general I feel this a better style anyway, people don't usually expect property access to have a cost. I can do the same for QRSparse.prow if you agree?

@JamesWrigley
Copy link
Contributor Author

One more optimization in a03ea62, now we're allocation-free :)

julia> @benchmark ldiv!($x, $F, $b)
BenchmarkTools.Trial: 10000 samples with 11 evaluations per sample.
 Range (min  max):  991.636 ns    2.668 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):       1.198 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.175 μs ± 142.443 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂▇█▄                           ▁                              
  ▃████▇▃▂▁▁▁▁▁▁▁▂▁▁▁▁▂▃▄▅▅▅▄▄▃▄▇███▇▆▆▅▄▃▂▂▁▁▁▂▂▁▁▂▂▃▃▃▃▂▂▂▂▂▂ ▃
  992 ns           Histogram: frequency by time         1.46 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Following the suggestions in JuliaLang/julia#36313. Note: the downside is that Base.ReshapedArray is not a public API.

This reduces allocations for ldiv!() (and in general).
@JamesWrigley
Copy link
Contributor Author

BTW, is there any chance of getting this in for 1.13?

@rayegun rayegun added the backport 1.13 Change should be backported to release-1.13 label Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.13 Change should be backported to release-1.13

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing ldiv! overload on sparse QR

2 participants