-
Notifications
You must be signed in to change notification settings - Fork 1k
modularize optimization internals #7401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7401 +/- ##
==========================================
- Coverage 98.96% 98.94% -0.02%
==========================================
Files 87 87
Lines 16734 16819 +85
==========================================
+ Hits 16560 16642 +82
- Misses 174 177 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Generated via commit 8129198 Download link for the artifact containing the test results: ↓ atime-results.zip
|
|
I'm also not sure about moving the tests to |
inst/tests/tests.Rraw
Outdated
| test(2358.33,optimize=0:2, dt[, sum(a) / .N, b, verbose=TRUE], output=out) | ||
| test(2358.34,optimize=0:2, dt[, mean(a) * 2L + sum(a), b, verbose=TRUE], output=out) | ||
| test(2358.35,optimize=0:2, dt[, list(range=max(a)-min(a), avg=mean(a)), by=b, verbose=TRUE], output=out) | ||
| test(2358.36,optimize=0:2, dt[, .(max(a)-sqrt(min(a))), by=b, verbose=TRUE], output="GForce FALSE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading this inspires some stretch goals for future work:
- GForce for simple regression coefficients (
(mean(x*y) - mean(x)*mean(y))/(mean(x^2) - mean(x)^2)) sqrt(min(a))could do GForce aggregationlist(mean(x)/sd(x), mean(x), sd(x))could only computemean(),sd()oncemax(x) - min(x)could computerange()with GForce and then derive min, max (i.e., single pass instead of two)
Co-authored-by: Michael Chirico <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
|
@MichaelChirico I'm also not 100% convinced about the new |
I see. I still like the idea of a separate script -- the more we peel out of the behemoth tests.Rraw, the better. "eventually" it would be nice to have most tests live in purpose-made test scripts, IMO. |
| test(2357.2, fread(paste0("file://", f)), DT) | ||
| }) | ||
|
|
||
| # gforce should also work with Map in j #5336 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last idea -- what happens when the grouping column is part of the aggregation in j?
DT[, .(sum(b) - mean(a)), by=b]| } | ||
| } | ||
| # Call extracted GForce optimization function | ||
| if ( getOption("datatable.optimize")>=1L && (is.call(jsub) || (is.name(jsub) && jsub %chin% c(".SD", ".N"))) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move the latter test into .attempt_optimize? Just a quick escape at the top. I think it will look cleaner, and the checks for .SD/.N feel like they belong there, anyway.'
Edit: I see that it's feeding in to the verbose message below... I think that can also be moved into the helpers.
| } | ||
| } | ||
| } | ||
| # Call extracted GForce optimization function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment feels ephemeral -- it explains what's happening in this PR, but will look out of place in 5 years.
Maybe something like
Determine GForce-optimized query
| } | ||
|
|
||
| # attempts to optimize j expressions using lapply, GForce, and mean optimizations | ||
| .attempt_optimize = function(jsub, jvnames, sdvars, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really like how clean this is 👍
| GForce = FALSE | ||
|
|
||
| # FR #971, GForce kicks in on all subsets, no joins yet. Although joins could work with | ||
| # nomatch=NULL even now.. but not switching it on yet, will deal it separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert to a TODO?
|
|
||
| # FR #971, GForce kicks in on all subsets, no joins yet. Although joins could work with | ||
| # nomatch=NULL even now.. but not switching it on yet, will deal it separately. | ||
| if (getOption("datatable.optimize")>=2L && !is.data.table(i) && !byjoin && length(f__)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend unnesting:
if (getOption("datatable.optimize") < 2L || is.data.table(i) || byjoin || !length(f__))
return(list(GForce=FALSE, jsub=jsub))|
|
||
| # Old mean() optimization fallback when GForce is not used | ||
| .optimize_mean = function(jsub, SDenv, verbose, GForce) { | ||
| if (!GForce && !is.name(jsub)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same recommendation around unnesting:
if (GForce || is.name(jsub)) return(jsub)
MichaelChirico
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About halfway done reading the implementation now. Thanks for your patience with the review! I'm really excited for this to get finished :)
|
|
||
| # Optimize expressions using GForce (C-level optimizations) | ||
| # This function replaces functions like mean() with gmean() for fast C implementations | ||
| .optimize_gforce = function(jsub, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that comes to mind seeing such a long signature -- using a "struct" instead of passing individual arguments, e.g.
There may be some possibility to make the code easier to understand if some arguments are grouped or combined.
Not a requirement but something to ponder.

Mapinstead oflapplyturns GForce off #5336Adds arithmetic for GForce as demanded in #3815 but does not add support for blocks in j like
d[, j={x<-x; .(min(x))}, by=y].