Conversation
Codecov Report
@@ Coverage Diff @@
## master #86 +/- ##
==========================================
- Coverage 98.18% 96.22% -1.97%
==========================================
Files 1 1
Lines 386 424 +38
==========================================
+ Hits 379 408 +29
- Misses 7 16 +9
Continue to review full report at Codecov.
|
|
I still need to add correctness tests. |
|
@nalimilan - doing an "additional sorting" is not correct unfortunately (the second sorting destroys the first), so we need to do this independently unfortunately (still it should not be that bad). |
|
|
||
| minp, maxp = extrema(p) | ||
| _quantilesort!(v, sorted, minp, maxp) | ||
| if length(p) == 2 |
There was a problem hiding this comment.
Special-casing 2 is kind of weird. For example, for [0.25, 0.5, 0.75] this branch would probably also be faster, right? Actually, isn't this approach faster than the other in most cases?
A possible optimization would be to call partialsort! on the full array for the first quantile, then call it on a view from the first quantile to the end of the array for the second, and so on. Not sure that would make a big difference, but at least it shouldn't be really slower than sorting everything between the two extreme quantiles, right?
There was a problem hiding this comment.
Yes, it is faster:
julia> using Statistics
julia> using BenchmarkTools
julia> x = rand(10^4);
julia> @btime quantile($x, [0.25, 0.5, 0.75]);
410.400 μs (4 allocations: 78.42 KiB)
julia> @btime [quantile($x, 0.25), quantile($x, 0.5), quantile($x, 0.75)];
314.000 μs (7 allocations: 234.72 KiB)
The problem is that quantiles do not need to be sorted, so this complicates the code (but of course is doable as I guess sorting requested quantiles should not be problematic for performance).
We do not need views I think, as sort! supports passing start and stop ranges.
However, as it seems to be a bigger rework I will leave it for after DataFrames.jl 1.3 releaes when we work on general Statistics update.
There was a problem hiding this comment.
OK. Yes, sorting quantiles should have a negligible cost (and most of the time they will already be sorted).
There was a problem hiding this comment.
Do you feel like finishing this?
There was a problem hiding this comment.
I have opened https://github.com/JuliaLang/Statistics.jl/pull/91 to perform easy comparison of both.
Fixes #84