Before running XGBoost, we must set three types of parameters: general parameters, booster parameters and task parameters.
Note
Parameters in R package
In R-package, you can use .
(dot) to replace underscore in the parameters, for example, you can use max.depth
to indicate max_depth
. The underscore parameters are also valid in R.
booster
[default= gbtree
]gbtree
, gblinear
or dart
; gbtree
and dart
use tree based models while gblinear
uses linear functions.silent
[default=0]nthread
[default to maximum number of threads available if not set]disable_default_eval_metric
[default=0]num_pbuffer
[set automatically by XGBoost, no need to be set by user]num_feature
[set automatically by XGBoost, no need to be set by user]eta
[default=0.3, alias: learning_rate
]eta
shrinks the feature weights to make the boosting process more conservative.gamma
[default=0, alias: min_split_loss
]gamma
is, the more conservative the algorithm will be.max_depth
[default=6]grow_policy
is set of depthwise
.min_child_weight
[default=1]min_child_weight
, then the building process will give up further partitioning. In linear regression task, this simply corresponds to minimum number of instances needed to be in each node. The larger min_child_weight
is, the more conservative the algorithm will be.max_delta_step
[default=0]subsample
[default=1]colsample_bytree
[default=1]colsample_bylevel
[default=1]lambda
[default=1, alias: reg_lambda
]alpha
[default=0, alias: reg_alpha
]tree_method
string [default= auto
]tree_method=approx
.auto
, exact
, approx
, hist
, gpu_exact
, gpu_hist
auto
: Use heuristic to choose the fastest method.exact
) will be used.approx
) will be chosen.exact
: Exact greedy algorithm.approx
: Approximate greedy algorithm using quantile sketch and gradient histogram.hist
: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching.gpu_exact
: GPU implementation of exact
algorithm.gpu_hist
: GPU implementation of hist
algorithm.sketch_eps
[default=0.03]tree_method=approx
.O(1 / sketch_eps)
number of bins.
Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy.scale_pos_weight
[default=1]sum(negative instances) / sum(positive instances)
. See Parameters Tuning for more discussion. Also, see Higgs Kaggle competition demo for examples: R, py1, py2, py3.updater
[default= grow_colmaker,prune
]grow_colmaker
: non-distributed column-based construction of trees.distcol
: distributed tree construction with column-based data splitting mode.grow_histmaker
: distributed tree construction with row-based data splitting based on global proposal of histogram counting.grow_local_histmaker
: based on local histogram counting.grow_skmaker
: uses the approximate sketching algorithm.sync
: synchronizes trees in all distributed nodes.refresh
: refreshes tree’s statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.prune
: prunes the splits where loss < min_split_loss (or gamma).grow_histmaker,prune
.refresh_leaf
[default=1]refresh
updater plugin. When this flag is 1, tree leafs as well as tree nodes’ stats are updated. When it is 0, only node stats are updated.process_type
[default= default
]default
, update
default
: The normal boosting process which creates new trees.update
: Starts from an existing model and only updates its trees. In each boosting iteration, a tree from the initial model is taken, a specified sequence of updater plugins is run for that tree, and a modified tree is added to the new model. The new model would have either the same or smaller number of trees, depending on the number of boosting iteratons performed. Currently, the following built-in updater plugins could be meaningfully used with this process type: refresh
, prune
. With process_type=update
, one cannot use updater plugins that create new trees.grow_policy
[default= depthwise
]tree_method
is set to hist
.depthwise
, lossguide
depthwise
: split at nodes closest to the root.lossguide
: split at nodes with highest loss change.max_leaves
[default=0]grow_policy=lossguide
is set.max_bin
, [default=256]tree_method
is set to hist
.predictor
, [default=``cpu_predictor``]cpu_predictor
: Multicore CPU prediction algorithm.gpu_predictor
: Prediction using GPU. Default when tree_method
is gpu_exact
or gpu_hist
.booster=dart
)¶Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
sample_type
[default= uniform
]uniform
: dropped trees are selected uniformly.weighted
: dropped trees are selected in proportion to weight.normalize_type
[default= tree
]tree
: new trees have the same weight of each of dropped trees.1 / (k + learning_rate)
.k / (k + learning_rate)
.forest
: new trees have the same weight of sum of dropped trees (forest).1 / (1 + learning_rate)
.1 / (1 + learning_rate)
.rate_drop
[default=0.0]one_drop
[default=0]skip_drop
[default=0.0]gbtree
.skip_drop
has higher priority than rate_drop
or one_drop
.booster=gblinear
)¶lambda
[default=0, alias: reg_lambda
]alpha
[default=0, alias: reg_alpha
]updater
[default= shotgun
]shotgun
: Parallel coordinate descent algorithm based on shotgun algorithm. Uses ‘hogwild’ parallelism and therefore produces a nondeterministic solution on each run.coord_descent
: Ordinary coordinate descent algorithm. Also multithreaded but still produces a deterministic solution.feature_selector
[default= cyclic
]cyclic
: Deterministic selection by cycling through features one at a time.shuffle
: Similar to cyclic
but with random feature shuffling prior to each update.random
: A random (with replacement) coordinate selector.greedy
: Select coordinate with the greatest gradient magnitude. It has O(num_feature^2)
complexity. It is fully deterministic. It allows restricting the selection to top_k
features per group with the largest magnitude of univariate weight change, by setting the top_k
parameter. Doing so would reduce the complexity to O(num_feature*top_k)
.thrifty
: Thrifty, approximately-greedy feature selector. Prior to cyclic updates, reorders features in descending magnitude of their univariate weight changes. This operation is multithreaded and is a linear complexity approximation of the quadratic greedy selection. It allows restricting the selection to top_k
features per group with the largest magnitude of univariate weight change, by setting the top_k
parameter.top_k
[default=0]greedy
and thrifty
feature selector. The value of 0 means using all the features.objective=reg:tweedie
)¶tweedie_variance_power
[default=1.5]var(y) ~ E(y)^tweedie_variance_power
Specify the learning task and the corresponding learning objective. The objective options are below:
objective
[default=reg:linear]reg:linear
: linear regressionreg:logistic
: logistic regressionbinary:logistic
: logistic regression for binary classification, output probabilitybinary:logitraw
: logistic regression for binary classification, output score before logistic transformationbinary:hinge
: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.gpu:reg:linear
, gpu:reg:logistic
, gpu:binary:logistic
, gpu:binary:logitraw
: versions
of the corresponding objective functions evaluated on the GPU; note that like the GPU histogram algorithm,
they can only be used when the entire training session uses the same datasetcount:poisson
–poisson regression for count data, output mean of poisson distributionmax_delta_step
is set to 0.7 by default in poisson regression (used to safeguard optimization)survival:cox
: Cox regression for right censored survival time data (negative values are considered right censored).
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR
).multi:softmax
: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)multi:softprob
: same as softmax, but output a vector of ndata * nclass
, which can be further reshaped to ndata * nclass
matrix. The result contains predicted probability of each data point belonging to each class.rank:pairwise
: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimizedrank:ndcg
: Use LambdaMART to perform list-wise ranking where Normalized Discounted Cumulative Gain (NDCG) is maximizedrank:map
: Use LambdaMART to perform list-wise ranking where Mean Average Precision (MAP) is maximizedreg:gamma
: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.reg:tweedie
: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.base_score
[default=0.5]eval_metric
[default according to objective]eval_metric
won’t override previous onermse
: root mean square errormae
: mean absolute errorlogloss
: negative log-likelihooderror
: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases)
. For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.error@t
: a different than 0.5 binary classification threshold value could be specified by providing a numerical value through ‘t’.merror
: Multiclass classification error rate. It is calculated as #(wrong cases)/#(all cases)
.mlogloss
: Multiclass logloss.auc
: Area under the curveaucpr
: Area under the PR curvendcg
: Normalized Discounted Cumulative Gainmap
: Mean Average Precisionndcg@n
, map@n
: ‘n’ can be assigned as an integer to cut off the top positions in the lists for evaluation.ndcg-
, map-
, ndcg@n-
, map@n-
: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions.poisson-nloglik
: negative log-likelihood for Poisson regressiongamma-nloglik
: negative log-likelihood for gamma regressioncox-nloglik
: negative partial log-likelihood for Cox proportional hazards regressiongamma-deviance
: residual deviance for gamma regressiontweedie-nloglik
: negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power
parameter)seed
[default=0]The following parameters are only used in the console version of XGBoost
num_round
data
test:data
save_period
[default=0]save_period=10
means that for every 10 rounds XGBoost will save the model. Setting it to 0 means not saving any model during the training.task
[default= train
] options: train
, pred
, eval
, dump
train
: training using datapred
: making prediction for test:dataeval
: for evaluating statistics specified by eval[name]=filename
dump
: for dump the learned model into text formatmodel_in
[default=NULL]test
, eval
, dump
tasks. If it is specified in training, XGBoost will continue training from the input model.model_out
[default=NULL]0003.model
where 0003
is number of boosting rounds.model_dir
[default= models/
]fmap
dump_format
[default= text
] options: text
, json
name_dump
[default= dump.txt
]name_pred
[default= pred.txt
]pred_margin
[default=0]