Support non-interpolating quantile definitions#187
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #187 +/- ##
==========================================
- Coverage 96.66% 96.63% -0.04%
==========================================
Files 2 2
Lines 450 475 +25
==========================================
+ Hits 435 459 +24
- Misses 15 16 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if type == 1 | ||
| return v[clamp(ceil(Int, n*p), 1, n)] | ||
| elseif type == 2 | ||
| i = clamp(ceil(Int, n*p), 1, n) | ||
| j = clamp(floor(Int, n*p + 1), 1, n) | ||
| return middle(v[i], v[j]) | ||
| elseif type == 3 | ||
| return v[clamp(round(Int, n*p), 1, n)] |
There was a problem hiding this comment.
I have used simplified formulas specific to each case, as I find code resulting from using the single general formula from the Hyndman & Fan paper very hard to grasp without any advantage. I hope I didn't introduce mistakes, especially in corner cases. Please suggest things to test if you can find some that are not covered.
| @test quantile(v, 1.0, alpha=1.0, beta=1.0) ≈ 21.0 | ||
|
|
||
| # tests against R's quantile with type=1 | ||
| @test quantile(v, 0.0, type=1) === 2 |
There was a problem hiding this comment.
As you can see the type of the result can be different for types 1 and 3 as we keep the original type instead of using whatever type arithmetic operations produce. This makes sense and is even necessary if we want to work for types that don't support arithmetic.
But this means the inferred return type when passing type is a Union of two types. Maybe OK as it's small enough to be optimized out? If not there are probably ways to ensure inference works via combined use of Val(type) and @inline in a wrapper function.
EDIT: The situation is slightly worse for quantile([1, 2], [0.1, 0.2]) as the inferred type is Union{Vector{Float64}, Vector{Int64}, Vector{Real}}.
(Note that when omitting type, the inferred type is concrete as before so at least there's no regression.)
There was a problem hiding this comment.
I would add tests for p=0 and p=1 (i.e. integer p), or even p=false and p=true (I undesrand we allow it).
There was a problem hiding this comment.
Shouldn't constant propagation be able to handle this? I.e. make quantile_type1(v, p) = quantile(v, q, type=1) inferred?
There was a problem hiding this comment.
Unfortunately it doesn't infer in the current state of the PR. I need to add @inline for it to work. But that's not great as _quantile has to be @inline (as currently) too, which means we would inline a significant amount of code.
I've pushed a commit which inlines one-line methods but doesn't inline _quantile, which takes a Val instead. This infers correctly but the code is more complex.
I also tried Base.@constprop :aggressive. It improves inference for most methods, but not for those taking p::Vector (probably due to map).
| m = alpha + p * (one(alpha) - alpha - beta) | ||
| # Using fma here avoids some rounding errors when aleph is an integer | ||
| # The use of oftype supresses the promotion caused by alpha and beta | ||
| aleph = fma(n, p, oftype(p, m)) |
There was a problem hiding this comment.
this will error if p=0 or p=1 and m is a fraction.
There was a problem hiding this comment.
Good catch. Though this PR doesn't touch this code, let's handle it separately?
Reproducer:
quantile(1:3, 0, alpha=0.2, beta=0.0)| # Using fma here avoids some rounding errors when aleph is an integer | ||
| # The use of oftype supresses the promotion caused by alpha and beta | ||
| aleph = fma(n, p, oftype(p, m)) | ||
| j = clamp(trunc(Int, aleph), 1, n - 1) |
There was a problem hiding this comment.
is this clamp correct if p=1? It seems to me that then j should be n and later we just need to make sure not to use j+1.
There was a problem hiding this comment.
I think so, because γ is 1 in that case and we return (1-γ)*v[j] + γ*v[j + 1] (so we don't care about v[j]).
Add a `type` argument to `quantile` to support the three remaining (non-interpolating) types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.
02e40c2 to
4b09ad9
Compare
|
Gentle bump. :-) |
andreasnoack
left a comment
There was a problem hiding this comment.
Looks good to me. Just some minor comments/questions
| @test quantile(v, 1.0, alpha=1.0, beta=1.0) ≈ 21.0 | ||
|
|
||
| # tests against R's quantile with type=1 | ||
| @test quantile(v, 0.0, type=1) === 2 |
There was a problem hiding this comment.
Shouldn't constant propagation be able to handle this? I.e. make quantile_type1(v, p) = quantile(v, q, type=1) inferred?
src/Statistics.jl
Outdated
| alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3] | ||
| beta = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3] |
There was a problem hiding this comment.
I know it is no different from the current implementation but are we certain that these values should be Float64? E.g. if all arguments were Float32 and you wanted to avoid intermediate promotion. For elements with higher precision, I suppose this would also mean that there'd be a loss of precision because 1/3 is not representable as a binary float. Since it only affects type=8, and therefore a fairly negligible corner case, we can maybe just open an issue to have the loss on record instead of dealing with it here.
There was a problem hiding this comment.
Good point. I've turned these into Ints and Rationals instead, and convert them to the type of p (which determines the return type anyway due to promotion in the calculations). In theory this could be slightly breaking for the default quantile type if you passed non-Float64 p and values, but in practice I don't think it's used.
| - Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`) | ||
| - Def. 8: `alpha=1/3`, `beta=1/3` | ||
| - Def. 9: `alpha=3/8`, `beta=3/8` | ||
| The keyword argument `type` can be used to choose among the 9 definitions |
There was a problem hiding this comment.
We could consider calling the keyword definition to follow the naming of the paper. I guess type is what the R functions use.
There was a problem hiding this comment.
Yeah that's the R terminology. NumPy says "method" (but it takes names rather than numbers).
nalimilan
left a comment
There was a problem hiding this comment.
Woops, I completely forgot about this!
| @test quantile(v, 1.0, alpha=1.0, beta=1.0) ≈ 21.0 | ||
|
|
||
| # tests against R's quantile with type=1 | ||
| @test quantile(v, 0.0, type=1) === 2 |
There was a problem hiding this comment.
Unfortunately it doesn't infer in the current state of the PR. I need to add @inline for it to work. But that's not great as _quantile has to be @inline (as currently) too, which means we would inline a significant amount of code.
I've pushed a commit which inlines one-line methods but doesn't inline _quantile, which takes a Val instead. This infers correctly but the code is more complex.
I also tried Base.@constprop :aggressive. It improves inference for most methods, but not for those taking p::Vector (probably due to map).
src/Statistics.jl
Outdated
| alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3] | ||
| beta = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3] |
There was a problem hiding this comment.
Good point. I've turned these into Ints and Rationals instead, and convert them to the type of p (which determines the return type anyway due to promotion in the calculations). In theory this could be slightly breaking for the default quantile type if you passed non-Float64 p and values, but in practice I don't think it's used.
Add a
typeargument toquantileto support the three remaining types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.Fixes #185.