The Rcpp package multgam implements the empirical Bayes optimization algorithm described in El-Bachir and Davison (2019), which trains multiple generalized additive models (GAMs) and automatically tunes their L2 regularization hyper-parameters. Moreover, multgam provides automatic ridge penalty for multiple parametric non-linear regression models, where the linear or non-linear functions of inputs are not necessarily smooth but their regression weights are constrained by the L2 penalty with possibly different hyper-parameters.
The package multgam uses R as an interface for the optimization code implemented in C++, and uses the R package mgcv to set up the matrix of inputs, to visualize the learned functions, and to perform predictions.
1. Installation
2. Usage
2.1. Main training function
2.2. Supported probability distributions and examples
2.2.1. Classical exponential family distributions
2.2.2. Extreme value distribution families
2.2.3. Examples
2.3. Extension to new distributions
3. General comments
4. Bugs, help and suggestions
5. Citation
The Rcpp package multgam must be installed from source as follows.
- Download and extract the directory
multgam, which contains the package. - In the R file
install.R, update the character variablepath2multgamwith the path tomultgam. For example, ifmultgamhas been extracted in your desktop and you are using linux, you could use
path2multgam <- "~/Desktop/multgam"- Run the file
install.R.
The output/response variable can be a vector or a matrix from a univariate or a multivariate probability distribution, but the log-likelihood for the full dataset must be expressed as the sum of the log-likelihoods for an individual observation. A particular case is independent random observations.
In practice, multgam interprets a GAM as a multiple linear regression model whose weights are subject to the L2 penalty, and computes the corresponding regularization matrices. When the functions of inputs are smooth, splines for example, the regularization matrices are dense and represent the smoothing matrices. When the functions of inputs are weighted sums of predictors, the regularization matrices are the identity matrices, to which the user can assign different regularization hyper-parameters; see the argument groupReg in the function mtgam in Section 2.1.
Train a multiple generalized additive model using the function mtgam as follows
fit <- mtgam(dat, L.formula, fmName="gauss", lambInit=NULL, betaInit=NULL, groupReg=NULL,
iterMax=200, progressPen=FALSE, PenTol=.Machine$double.eps^.5, progressML=FALSE, MLTol=1e-07, ...)with arguments:
dat: a list or a data frame whose columns contain the input and the output variables used inL.formula; specific distribution family considerations can be found in Section 2.2,L.formula: a list of as many formulas as there are output variables with additive structures of input variables. For additional information ondatandL.formulasee the examples in Section 2.2, or the documentation of the R packagemgcvon CRAN. The packagemultgamhas been tested with the basis functionsbs="tp"(thin plate regression splines),bs="cr"(cubic regression splines),bs="cc"(cyclic cubic regression splines),fmName: a character variable for the name of the probability distribution of the output variables:"gauss"for the Gaussian distribution,"poisson"for the Poisson distribution,"binom"for the binomial distribution,"expon"for the exponential distribution,"gamma"for the gamma distribution,"gev"for the generalized extreme value distribution,"gpd"for the generalized Pareto distribution,"pp"for the point process approach in extreme value analysis,"rgev"for the r-largest extreme value distribution. Details on their parametrization and specific considerations can be found in Section 2.2,lambInit: a vector of starting values for the L2 regularization hyper-parameters. This should contain as many values as non-zero elements supplied to the argumentgroupReg, in addition to the number of smooth functions, if any. Default values are provided,betaInit: a vector of starting values for the regression weights. Default values are provided,groupReg: a list of lengthL.formuladescribing how the L2 regularization hyper-parameters in the multiple parametric regression models, i.e., non-smooth functions of inputs, should be grouped. Each element ofgroupRegis a vector referring to a formula inL.formulaand contains the numbers of successive input variables in that formula whose regression weights share the same hyper-parameter. If the only term in a formula is an offset, then the corresponding element ofgroupRegshould take the value0, so the corresponding regression weight will not be penalized. In the defaultgroupReg=NULL, the regression weights of a smooth function of inputs share the same hyper-parameter, but different smooth functions are penalized by different hyper-parameters, and all the remaining non-smooth functions of inputs share the same hyper-parameter. For example, if we haveL.formula <- list(y ~ x1 + x2 + x3 + s(x1) + s(x2), ~ 1), thengroupReg=NULLwould correspond to one hyper-parameter penalizing the three regression weights of the triple(x1, x2, x3), one hyper-parameter for the regression weights of the smooth functions(x1), one hyper-parameter fors(x2)and no hyper-parameter on the offset of the second output variable. However, if the regression weight of the input variablex1is constrained by an L2 penalty, and the regression weights ofx2andx3share the same hyper-parameter, then thegroupRegcorresponding to thatL.formulashould begroupReg <- list(c(1, 2), 0), where1corresponds to having one hyper-parameter on the regression weight ofx1,2to having one hyper-parameter on the pair(x2, x3), and0for the offset of the second output variable,iterMax: an integer for the number of maximal iterations in the optimization of the log-marginal likelihood and the penalized log-likelihood,progressPen: ifprogressPen=TRUE, information about the progress of the penalized log-likelihood maximization will be printed,PenTol: the tolerance in the maximization of the penalized log-likelihood,progressML: ifprogressML=TRUE, information about the progress of the log-marginal likelihood maximization will be printed,MLTol: the tolerance in the maximization of the log-marginal likelihood,....: additional arguments supplied to the functiongam()inmgcvfor setting up the input matrix and the smoothing matrices.
In the call to mtgam, the variable fit contains useful outputs for plots, predictions, etc..., as if fit resulted from the function gam() in mgcv. The only exception is the vector sp, which corresponds in gam() to the hyper-parameters for the smooth functions only, whereas in mtgam, sp contains the values of all the hyper-parameters including those described by the non-zero values in groupReg. Following the example given in the description of groupReg above, if we have L.formula <- list(y ~ x1 + x2 + x3 + s(x1) + s(x2), ~ 1) and groupReg=NULL, then fit$sp would be (lamb1, lamb2, lamb3), where lamb1 would be the hyper-parameter corresponding to the regression weights for (x1, x2, x3), then lamb2 would be associated to the regression weights of s(x1), and lamb3 to s(x2). If groupReg <- list(c(1, 2), 0) then fit$sp would be (lamb1, lamb2, lamb3, lamb4), where lamb1 would be the hyper-parameter corresponding to the regression weight of x1, lamb2 to the pair (x2, x3), lamb3 to s(x1) and lamb4 to s(x2). Further details can be found in Section 2.2.3.
The function mtgam supports any probability distribution whose log-likelihood is differentiable up to the third order and whose parametrization does not constrain the range values of the functional parameters. We describe the supported parametrizations in the following sections.
- Gaussian distribution:
fmName="gauss"implementsN(mu, tau), wheremuis the mean andtauis2 log(sigma)withsigmathe standard deviation, - Poisson distribution:
fmName="poisson"implementsPoiss(mu), wheremuis the log-rate, - Exponential distribution:
fmName="expon"implementsExpon(mu), wheremuis the log-rate, - Gamma distribution:
fmName="gamma"implementsGamma(mu, tau), wheremuis the log-shape,tauis-log(sigma), andsigmais the scale, - Binomial distribution:
fmName="binom"implementsBinom(mu), wheremuis the logit, i.e.,log(p/(1-p))withpthe probability of success.
- Generalized extreme value distribution:
fmName="gev"implementsGEV(mu, tau, xi), wheremuis the location,tauis the log-scale andxiis the shape, - Generalized Pareto distribution:
fmName="gpd"implementsGPD(mu, tau), wheretauis the log-scale andxiis the shape, - Point process approach in extreme value analysis:
fmName="pp"implementsPP(mu, tau, xi), wheremuis the location,tauis the log-scale andxiis the shape. The output variabley(say) in the argumentdatof the functionmtgammust be an nx(N+2)-matrix, wherenis the number of blocks of data that are above the thresholds andNis the largest number of block exceedances. Each of thenrows of the data matrixdat$ycorresponds to a block of data and must be filled accordingly. In particular, the first column ofdat$ymust contain the vector of thenblock sizes, i.e., numbers of exceedances above the thresholds, the second column must be the vector of thenthresholds, and the remaining columns must be filled with the threshold exceedances andNAvalues when the sizen_iof thei-th block contains fewer exceedances thanN, i.e., whenn_i<N. Theppmodel is still experimental, - r-Largest extreme value distribution:
fmName="rgev"implementsrGEV(mu, tau, xi), wheremuis the location,tauis the log-scale andxiis the shape. As the analogue of the point process approach, the output variabley(say) in the argumentdatshould be an nxr-matrix, wherenis the number of blocks of data andris the pre-specified number of largest extremal data per block. In particular, the values in the rows ofdat$ymust be sorted in ascending order.
Data from the families gev, gpd and rgev can be simulated using the function
simExtrem(mu=NULL, sigma=NULL, xi=NULL, r=NULL, family="gev")with arguments:
mu: a vector of location parameters for the full dataset,sigma: a vector of scale parameters for the full dataset,xi: a vector of shape parameters for the full dataset,r: an integer for the number of r largest extremal data per block in the r-largest model,family: a character variable which takes either"gev","gpd"or"rgev",
and output:
- if
family="gev"orfamily="gpd": a vector of generated data, - if
family="rgev": a matrix of sizenxr, wherenis the length ofmuandris the number of r largest extremal data per block. The values in each of the rows are sorted in ascending order.
Return levels (quantiles) from the families gev and gpd can be computed by the function
returnLevel(prob=NULL, mu=NULL, sigma=NULL, xi=NULL, family="gev")with arguments:
prob: a scalar for the probability for which the return level is computed,mu: a vector of location parameters for the full dataset,sigma: a vector of scale parameters for the full dataset,xi: a vector of shape parameters for the full dataset,family: a character variable which takes either"gev"or"gpd",
and output:
- a vector of return levels corresponding to the probability
proband the functional parametersmu,sigmaandxi.
The R file examples.R illustrates three key calls to mtgam:
- the usage of
groupRegon the Gaussian model for example, - the training of a multiple generalized additive models on the supported distributions,
- the definition of
datfor theppmodel (in pseudo-code).
New families of distributions can be implemented by the user and added to multgam, but for a numerically stable implementation, it is preferable to contact the author at [email protected] who can do this for you.
- The package is under development and is currently being re-implemented with the object-oriented perspective in mind.
- The package has been tested with the following basis functions in the argument
L.formulainmtgam:bs="tp"(thin plate regression splines),bs="cr"(cubic regression splines),bs="cc"(cyclic cubic regression splines). - The point process approach in extreme value analysis, i.e.
fmName="pp", is still experimental. - The convergence criteria are conservative. If the training does not converge, increase
MLTolto1e-06or1e5. If this still does not converge, please report the error to the author following Section 4.
Bugs can be reported to [email protected] by sending an email with:
- subject: multgam: bugs,
- content: a reproducible example and a simple description of the problem.
Further details on the usage of the package or suggestions for additional extensions can be requested to the author.
Acknowledge the use of multgam by referring to this web-page and citing the paper El-Bachir and Davison (2019) using (bibtex)
@article{JMLR:v20:18-659,
author = {El-Bachir, Y. and Davison, A. C.},
title = {Fast {A}utomatic {S}moothing for {G}eneralized {A}dditive {M}odels},
journal = {Journal of Machine Learning Research},
year = {2019},
volume = {20},
number = {173},
pages = {1--27},
url = {http://jmlr.org/papers/v20/18-659.html}
}
Yousra El-Bachir and Anthony C. Davison. Fast automatic smoothing for generalized additive models. Journal of Machine Learning Research, 20(173):1-27, 2019. Available at http://jmlr.org/beta/papers/v20/18-659.html.