Python API¶
Data Structure API¶
-
class
lightgbm.
Dataset
(data, label=None, reference=None, weight=None, group=None, init_score=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)[source]¶ Bases:
object
Dataset in LightGBM.
Initialize Dataset.
Parameters: - data (string, numpy array, pandas DataFrame, scipy.sparse or list of numpy arrays) – Data source of Dataset. If string, it represents the path to txt file.
- label (list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)) – Label of the data.
- reference (Dataset or None, optional (default=None)) – If this is Dataset for validation, training data should be used as reference.
- weight (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Weight for each instance.
- group (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Group/query size for Dataset.
- init_score (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Init score for Dataset.
- silent (bool, optional (default=False)) – Whether to print messages during construction.
- feature_name (list of strings or 'auto', optional (default="auto")) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default="auto")) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - params (dict or None, optional (default=None)) – Other parameters for Dataset.
- free_raw_data (bool, optional (default=True)) – If True, raw data is freed after constructing inner Dataset.
-
create_valid
(data, label=None, weight=None, group=None, init_score=None, silent=False, params=None)[source]¶ Create validation data align with current Dataset.
Parameters: - data (string, numpy array, pandas DataFrame, scipy.sparse or list of numpy arrays) – Data source of Dataset. If string, it represents the path to txt file.
- label (list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)) – Label of the data.
- weight (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Weight for each instance.
- group (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Group/query size for Dataset.
- init_score (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Init score for Dataset.
- silent (bool, optional (default=False)) – Whether to print messages during construction.
- params (dict or None, optional (default=None)) – Other parameters for validation Dataset.
Returns: valid – Validation Dataset with reference to self.
Return type:
-
get_field
(field_name)[source]¶ Get property from the Dataset.
Parameters: field_name (string) – The field name of the information. Returns: info – A numpy array with information from the Dataset. Return type: numpy array
-
get_group
()[source]¶ Get the group of the Dataset.
Returns: group – Group size of each group. Return type: numpy array or None
-
get_init_score
()[source]¶ Get the initial score of the Dataset.
Returns: init_score – Init score of Booster. Return type: numpy array or None
-
get_label
()[source]¶ Get the label of the Dataset.
Returns: label – The label information from the Dataset. Return type: numpy array or None
-
get_ref_chain
(ref_limit=100)[source]¶ Get a chain of Dataset objects.
Starts with r, then goes to r.reference (if exists), then to r.reference.reference, etc. until we hit
ref_limit
or a reference loop.Parameters: ref_limit (int, optional (default=100)) – The limit number of references. Returns: ref_chain – Chain of references of the Datasets. Return type: set of Dataset
-
get_weight
()[source]¶ Get the weight of the Dataset.
Returns: weight – Weight for each data point from the Dataset. Return type: numpy array or None
-
num_data
()[source]¶ Get the number of rows in the Dataset.
Returns: number_of_rows – The number of rows in the Dataset. Return type: int
-
num_feature
()[source]¶ Get the number of columns (features) in the Dataset.
Returns: number_of_columns – The number of columns (features) in the Dataset. Return type: int
-
save_binary
(filename)[source]¶ Save Dataset to a binary file.
Parameters: filename (string) – Name of the output file. Returns: self – Returns self. Return type: Dataset
-
set_categorical_feature
(categorical_feature)[source]¶ Set categorical features.
Parameters: categorical_feature (list of int or strings) – Names or indices of categorical features. Returns: self – Dataset with set categorical features. Return type: Dataset
-
set_feature_name
(feature_name)[source]¶ Set feature name.
Parameters: feature_name (list of strings) – Feature names. Returns: self – Dataset with set feature name. Return type: Dataset
-
set_field
(field_name, data)[source]¶ Set property into the Dataset.
Parameters: - field_name (string) – The field name of the information.
- data (list, numpy 1-D array, pandas Series or None) – The array of data to be set.
Returns: self – Dataset with set property.
Return type:
-
set_group
(group)[source]¶ Set group size of Dataset (used for ranking).
Parameters: group (list, numpy 1-D array, pandas Series or None) – Group size of each group. Returns: self – Dataset with set group. Return type: Dataset
-
set_init_score
(init_score)[source]¶ Set init score of Booster to start from.
Parameters: init_score (list, numpy 1-D array, pandas Series or None) – Init score for Booster. Returns: self – Dataset with set init score. Return type: Dataset
-
set_label
(label)[source]¶ Set label of Dataset.
Parameters: label (list, numpy 1-D array, pandas Series / one-column DataFrame or None) – The label information to be set into Dataset. Returns: self – Dataset with set label. Return type: Dataset
-
set_reference
(reference)[source]¶ Set reference Dataset.
Parameters: reference (Dataset) – Reference that is used as a template to construct the current Dataset. Returns: self – Dataset with set reference. Return type: Dataset
-
set_weight
(weight)[source]¶ Set weight of each instance.
Parameters: weight (list, numpy 1-D array, pandas Series or None) – Weight to be set for each data point. Returns: self – Dataset with set weight. Return type: Dataset
-
subset
(used_indices, params=None)[source]¶ Get subset of current Dataset.
Parameters: - used_indices (list of int) – Indices used to create the subset.
- params (dict or None, optional (default=None)) – These parameters will be passed to Dataset constructor.
Returns: subset – Subset of the current Dataset.
Return type:
-
class
lightgbm.
Booster
(params=None, train_set=None, model_file=None, silent=False)[source]¶ Bases:
object
Booster in LightGBM.
Initialize the Booster.
Parameters: - params (dict or None, optional (default=None)) – Parameters for Booster.
- train_set (Dataset or None, optional (default=None)) – Training dataset.
- model_file (string or None, optional (default=None)) – Path to the model file.
- silent (bool, optional (default=False)) – Whether to print messages during construction.
-
add_valid
(data, name)[source]¶ Add validation data.
Parameters: - data (Dataset) – Validation data.
- name (string) – Name of validation data.
Returns: self – Booster with set validation data.
Return type:
-
attr
(key)[source]¶ Get attribute string from the Booster.
Parameters: key (string) – The name of the attribute. Returns: value – The attribute value. Returns None if attribute does not exist. Return type: string or None
-
current_iteration
()[source]¶ Get the index of the current iteration.
Returns: cur_iter – The index of the current iteration. Return type: int
-
dump_model
(num_iteration=None, start_iteration=0)[source]¶ Dump Booster to JSON format.
Parameters: - num_iteration (int or None, optional (default=None)) – Index of the iteration that should be dumped. If None, if the best iteration exists, it is dumped; otherwise, all iterations are dumped. If <= 0, all iterations are dumped.
- start_iteration (int, optional (default=0)) – Start index of the iteration that should be dumped.
Returns: json_repr – JSON format of Booster.
Return type: dict
-
eval
(data, name, feval=None)[source]¶ Evaluate for data.
Parameters: - data (Dataset) – Data for the evaluating.
- name (string) – Name of the data.
- feval (callable or None, optional (default=None)) – Customized evaluation function. Should accept two parameters: preds, train_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples. For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i].
Returns: result – List with evaluation results.
Return type: list
-
eval_train
(feval=None)[source]¶ Evaluate for training data.
Parameters: feval (callable or None, optional (default=None)) – Customized evaluation function. Should accept two parameters: preds, train_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples. For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i]. Returns: result – List with evaluation results. Return type: list
-
eval_valid
(feval=None)[source]¶ Evaluate for validation data.
Parameters: feval (callable or None, optional (default=None)) – Customized evaluation function. Should accept two parameters: preds, train_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples. For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i]. Returns: result – List with evaluation results. Return type: list
-
feature_importance
(importance_type='split', iteration=None)[source]¶ Get feature importances.
Parameters: - importance_type (string, optional (default="split")) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.
- iteration (int or None, optional (default=None)) – Limit number of iterations in the feature importance calculation. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
Returns: result – Array with feature importances.
Return type: numpy array
-
feature_name
()[source]¶ Get names of features.
Returns: result – List with names of features. Return type: list
-
free_dataset
()[source]¶ Free Booster’s Datasets.
Returns: self – Booster without Datasets. Return type: Booster
-
free_network
()[source]¶ Free Booster’s network.
Returns: self – Booster with freed network. Return type: Booster
-
get_leaf_output
(tree_id, leaf_id)[source]¶ Get the output of a leaf.
Parameters: - tree_id (int) – The index of the tree.
- leaf_id (int) – The index of the leaf in the tree.
Returns: result – The output of the leaf.
Return type: float
-
model_from_string
(model_str, verbose=True)[source]¶ Load Booster from a string.
Parameters: - model_str (string) – Model will be loaded from this string.
- verbose (bool, optional (default=True)) – Whether to print messages while loading model.
Returns: self – Loaded Booster object.
Return type:
-
model_to_string
(num_iteration=None, start_iteration=0)[source]¶ Save Booster to string.
Parameters: - num_iteration (int or None, optional (default=None)) – Index of the iteration that should be saved. If None, if the best iteration exists, it is saved; otherwise, all iterations are saved. If <= 0, all iterations are saved.
- start_iteration (int, optional (default=0)) – Start index of the iteration that should be saved.
Returns: str_repr – String representation of Booster.
Return type: string
-
num_feature
()[source]¶ Get number of features.
Returns: num_feature – The number of features. Return type: int
-
num_model_per_iteration
()[source]¶ Get number of models per iteration.
Returns: model_per_iter – The number of models per iteration. Return type: int
-
num_trees
()[source]¶ Get number of weak sub-models.
Returns: num_trees – The number of weak sub-models. Return type: int
-
predict
(data, num_iteration=None, raw_score=False, pred_leaf=False, pred_contrib=False, data_has_header=False, is_reshape=True, **kwargs)[source]¶ Make a prediction.
Parameters: - data (string, numpy array, pandas DataFrame or scipy.sparse) – Data source for prediction. If string, it represents the path to txt file.
- num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all iterations are used. If <= 0, all iterations are used (no limits).
- raw_score (bool, optional (default=False)) – Whether to predict raw scores.
- pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
- pred_contrib (bool, optional (default=False)) –
Whether to predict feature contributions.
Note
If you want to get more explanation for your model’s predictions using SHAP values like SHAP interaction values, you can install shap package (https://github.com/slundberg/shap).
- data_has_header (bool, optional (default=False)) – Whether the data has header. Used only if data is string.
- is_reshape (bool, optional (default=True)) – If True, result is reshaped to [nrow, ncol].
- **kwargs – Other parameters for the prediction.
Returns: result – Prediction result.
Return type: numpy array
-
refit
(data, label, decay_rate=0.9, **kwargs)[source]¶ Refit the existing Booster by new data.
Parameters: - data (string, numpy array, pandas DataFrame or scipy.sparse) – Data source for refit. If string, it represents the path to txt file.
- label (list, numpy 1-D array or pandas Series / one-column DataFrame) – Label for refit.
- decay_rate (float, optional (default=0.9)) – Decay rate of refit,
will use
leaf_output = decay_rate * old_leaf_output + (1.0 - decay_rate) * new_leaf_output
to refit trees. - **kwargs – Other parameters for refit.
These parameters will be passed to
predict
method.
Returns: result – Refitted Booster.
Return type:
-
reset_parameter
(params)[source]¶ Reset parameters of Booster.
Parameters: params (dict) – New parameters for Booster. Returns: self – Booster with new parameters. Return type: Booster
-
rollback_one_iter
()[source]¶ Rollback one iteration.
Returns: self – Booster with rolled back one iteration. Return type: Booster
-
save_model
(filename, num_iteration=None, start_iteration=0)[source]¶ Save Booster to file.
Parameters: - filename (string) – Filename to save Booster.
- num_iteration (int or None, optional (default=None)) – Index of the iteration that should be saved. If None, if the best iteration exists, it is saved; otherwise, all iterations are saved. If <= 0, all iterations are saved.
- start_iteration (int, optional (default=0)) – Start index of the iteration that should be saved.
Returns: self – Returns self.
Return type:
-
set_attr
(**kwargs)[source]¶ Set attributes to the Booster.
Parameters: **kwargs – The attributes to set. Setting a value to None deletes an attribute. Returns: self – Booster with set attributes. Return type: Booster
-
set_network
(machines, local_listen_port=12400, listen_time_out=120, num_machines=1)[source]¶ Set the network configuration.
Parameters: - machines (list, set or string) – Names of machines.
- local_listen_port (int, optional (default=12400)) – TCP listen port for local machines.
- listen_time_out (int, optional (default=120)) – Socket time-out in minutes.
- num_machines (int, optional (default=1)) – The number of machines for parallel learning application.
Returns: self – Booster with set network.
Return type:
-
set_train_data_name
(name)[source]¶ Set the name to the training Dataset.
Parameters: name (string) – Name for the training Dataset. Returns: self – Booster with set training Dataset name. Return type: Booster
-
shuffle_models
(start_iteration=0, end_iteration=-1)[source]¶ Shuffle models.
Parameters: - start_iteration (int, optional (default=0)) – The first iteration that will be shuffled.
- end_iteration (int, optional (default=-1)) – The last iteration that will be shuffled. If <= 0, means the last available iteration.
Returns: self – Booster with shuffled models.
Return type:
-
update
(train_set=None, fobj=None)[source]¶ Update Booster for one iteration.
Parameters: - train_set (Dataset or None, optional (default=None)) – Training data. If None, last training data is used.
- fobj (callable or None, optional (default=None)) –
Customized objective function.
For multi-class task, the score is group by class_id first, then group by row_id. If you want to get i-th row score in j-th class, the access way is score[j * num_data + i] and you should group grad and hess in this way as well.
Returns: is_finished – Whether the update was successfully finished.
Return type: bool
Training API¶
-
lightgbm.
train
(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, keep_training_booster=False, callbacks=None)[source]¶ Perform the training with given parameters.
Parameters: - params (dict) – Parameters for training.
- train_set (Dataset) – Data to be trained on.
- num_boost_round (int, optional (default=100)) – Number of boosting iterations.
- valid_sets (list of Datasets or None, optional (default=None)) – List of data to be evaluated on during training.
- valid_names (list of strings or None, optional (default=None)) – Names of
valid_sets
. - fobj (callable or None, optional (default=None)) – Customized objective function.
- feval (callable or None, optional (default=None)) – Customized evaluation function.
Should accept two parameters: preds, train_data,
and return (eval_name, eval_result, is_higher_better) or list of such tuples.
For multi-class task, the preds is group by class_id first, then group by row_id.
If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i].
To ignore the default metric corresponding to the used objective,
set the
metric
parameter to the string"None"
inparams
. - init_model (string, Booster or None, optional (default=None)) – Filename of LightGBM model or Booster instance used for continue training.
- feature_name (list of strings or 'auto', optional (default="auto")) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default="auto")) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - early_stopping_rounds (int or None, optional (default=None)) – Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. The index of iteration that has the best performance will be saved in thebest_iteration
field if early stopping logic is enabled by settingearly_stopping_rounds
. - evals_result (dict or None, optional (default=None)) –
This dictionary used to store all evaluation results of all the items in
valid_sets
.Example
With a
valid_sets
= [valid_set, train_set],valid_names
= [‘eval’, ‘train’] and aparams
= {‘metric’: ‘logloss’} returns {‘train’: {‘logloss’: [‘0.48253’, ‘0.35953’, …]}, ‘eval’: {‘logloss’: [‘0.480385’, ‘0.357756’, …]}}. - verbose_eval (bool or int, optional (default=True)) –
Requires at least one validation data. If True, the eval metric on the valid set is printed at each boosting stage. If int, the eval metric on the valid set is printed at every
verbose_eval
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose_eval
= 4 and at least one item invalid_sets
, an evaluation metric is printed every 4 (instead of 1) boosting stages. - learning_rates (list, callable or None, optional (default=None)) – List of learning rates for each boosting round
or a customized function that calculates
learning_rate
in terms of current number of round (e.g. yields learning rate decay). - keep_training_booster (bool, optional (default=False)) – Whether the returned Booster will be used to keep training.
If False, the returned value will be converted into _InnerPredictor before returning.
You can still use _InnerPredictor as
init_model
for future continue training. - callbacks (list of callables or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
Returns: booster – The trained Booster model.
Return type:
-
lightgbm.
cv
(params, train_set, num_boost_round=100, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)[source]¶ Perform the cross-validation with given paramaters.
Parameters: - params (dict) – Parameters for Booster.
- train_set (Dataset) – Data to be trained on.
- num_boost_round (int, optional (default=100)) – Number of boosting iterations.
- folds (generator or iterator of (train_idx, test_idx) tuples, scikit-learn splitter object or None, optional (default=None)) – If generator or iterator, it should yield the train and test indices for each fold.
If object, it should be one of the scikit-learn splitter classes
(http://scikit-learn.org/stable/modules/classes.html#splitter-classes)
and have
split
method. This argument has highest priority over other data split arguments. - nfold (int, optional (default=5)) – Number of folds in CV.
- stratified (bool, optional (default=True)) – Whether to perform stratified sampling.
- shuffle (bool, optional (default=True)) – Whether to shuffle before splitting data.
- metrics (string, list of strings or None, optional (default=None)) – Evaluation metrics to be monitored while CV.
If not None, the metric in
params
will be overridden. - fobj (callable or None, optional (default=None)) – Custom objective function.
- feval (callable or None, optional (default=None)) – Customized evaluation function.
Should accept two parameters: preds, train_data,
and return (eval_name, eval_result, is_higher_better) or list of such tuples.
For multi-class task, the preds is group by class_id first, then group by row_id.
If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i].
To ignore the default metric corresponding to the used objective,
set
metrics
to the string"None"
. - init_model (string, Booster or None, optional (default=None)) – Filename of LightGBM model or Booster instance used for continue training.
- feature_name (list of strings or 'auto', optional (default="auto")) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default="auto")) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - early_stopping_rounds (int or None, optional (default=None)) – Activates early stopping.
CV score needs to improve at least every
early_stopping_rounds
round(s) to continue. Requires at least one metric. If there’s more than one, will check all of them. Last entry in evaluation history is the one from the best iteration. - fpreproc (callable or None, optional (default=None)) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those.
- verbose_eval (bool, int, or None, optional (default=None)) – Whether to display the progress.
If None, progress will be displayed when np.ndarray is returned.
If True, progress will be displayed at every boosting stage.
If int, progress will be displayed at every given
verbose_eval
boosting stage. - show_stdv (bool, optional (default=True)) – Whether to display the standard deviation in progress. Results are not affected by this parameter, and always contain std.
- seed (int, optional (default=0)) – Seed used to generate the folds (passed to numpy.random.seed).
- callbacks (list of callables or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
Returns: eval_hist – Evaluation history. The dictionary has the following format: {‘metric1-mean’: [values], ‘metric1-stdv’: [values], ‘metric2-mean’: [values], ‘metric2-stdv’: [values], …}.
Return type: dict
Scikit-learn API¶
-
class
lightgbm.
LGBMModel
(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, importance_type='split', **kwargs)[source]¶ Bases:
object
Implementation of the scikit-learn API for LightGBM.
Construct a gradient boosting model.
Parameters: - boosting_type (string, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
- num_leaves (int, optional (default=31)) – Maximum tree leaves for base learners.
- max_depth (int, optional (default=-1)) – Maximum tree depth for base learners, -1 means no limit.
- learning_rate (float, optional (default=0.1)) – Boosting learning rate.
You can use
callbacks
parameter offit
method to shrink/adapt learning rate in training usingreset_parameter
callback. Note, that this will ignore thelearning_rate
argument in training. - n_estimators (int, optional (default=100)) – Number of boosted trees to fit.
- subsample_for_bin (int, optional (default=200000)) – Number of samples for constructing bins.
- objective (string, callable or None, optional (default=None)) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
- class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form
{class_label: weight}
. Use this parameter only for multi-class classification task; for binary classification task you may useis_unbalance
orscale_pos_weight
parameters. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y))
. If None, all classes are supposed to have weight one. Note, that these weights will be multiplied withsample_weight
(passed through thefit
method) ifsample_weight
is specified. - min_split_gain (float, optional (default=0.)) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
- min_child_weight (float, optional (default=1e-3)) – Minimum sum of instance weight (hessian) needed in a child (leaf).
- min_child_samples (int, optional (default=20)) – Minimum number of data needed in a child (leaf).
- subsample (float, optional (default=1.)) – Subsample ratio of the training instance.
- subsample_freq (int, optional (default=0)) – Frequence of subsample, <=0 means no enable.
- colsample_bytree (float, optional (default=1.)) – Subsample ratio of columns when constructing each tree.
- reg_alpha (float, optional (default=0.)) – L1 regularization term on weights.
- reg_lambda (float, optional (default=0.)) – L2 regularization term on weights.
- random_state (int or None, optional (default=None)) – Random number seed. If None, default seeds in C++ code will be used.
- n_jobs (int, optional (default=-1)) – Number of parallel threads.
- silent (bool, optional (default=True)) – Whether to print messages while running boosting.
- importance_type (string, optional (default='split')) – The type of feature importance to be filled into
feature_importances_
. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature. - **kwargs –
Other parameters for the model. Check http://lightgbm.readthedocs.io/en/latest/Parameters.html for more parameters.
Note
**kwargs is not supported in sklearn, it may cause unexpected issues.
-
n_features_
¶ The number of features of fitted model.
Type: int
-
classes_
¶ The class label array (only for classification problem).
Type: array of shape = [n_classes]
-
n_classes_
¶ The number of classes (only for classification problem).
Type: int
-
best_score_
¶ The best score of fitted model.
Type: dict or None
-
best_iteration_
¶ The best iteration of fitted model if
early_stopping_rounds
has been specified.Type: int or None
-
objective_
¶ The concrete objective used while fitting this model.
Type: string or callable
-
evals_result_
¶ The evaluation results if
early_stopping_rounds
has been specified.Type: dict or None
-
feature_importances_
¶ The feature importances (the higher, the more important the feature).
Type: array of shape = [n_features]
Note
A custom objective function can be provided for the
objective
parameter. In this case, it should have the signatureobjective(y_true, y_pred) -> grad, hess
orobjective(y_true, y_pred, group) -> grad, hess
:- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- group : array-like
- Group/query data, used for ranking task.
- grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the gradient for each sample point.
- hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the second derivative for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well.
-
best_iteration_
Get the best iteration of fitted model.
-
best_score_
Get the best score of fitted model.
-
booster_
Get the underlying lightgbm Booster of this model.
-
evals_result_
Get the evaluation results.
-
feature_importances_
Get feature importances.
Note
Feature importance in sklearn interface used to normalize to 1, it’s deprecated after 2.0.4 and is the same as Booster.feature_importance() now.
importance_type
attribute is passed to the function to configure the type of importance values to be extracted.
-
fit
(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_class_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)[source]¶ Build a gradient boosting model from the training set (X, y).
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input feature matrix.
- y (array-like of shape = [n_samples]) – The target values (class labels in classification, real numbers in regression).
- sample_weight (array-like of shape = [n_samples] or None, optional (default=None)) – Weights of training data.
- init_score (array-like of shape = [n_samples] or None, optional (default=None)) – Init score of training data.
- group (array-like or None, optional (default=None)) – Group data of training data.
- eval_set (list or None, optional (default=None)) – A list of (X, y) tuple pairs to use as validation sets.
- eval_names (list of strings or None, optional (default=None)) – Names of eval_set.
- eval_sample_weight (list of arrays or None, optional (default=None)) – Weights of eval data.
- eval_class_weight (list or None, optional (default=None)) – Class weights of eval data.
- eval_init_score (list of arrays or None, optional (default=None)) – Init score of eval data.
- eval_group (list of arrays or None, optional (default=None)) – Group data of eval data.
- eval_metric (string, list of strings, callable or None, optional (default=None)) – If string, it should be a built-in evaluation metric to use.
If callable, it should be a custom evaluation metric, see note below for more details.
In either case, the
metric
from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker. - early_stopping_rounds (int or None, optional (default=None)) – Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. - verbose (bool or int, optional (default=True)) –
Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every
verbose
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose
= 4 and at least one item ineval_set
, an evaluation metric is printed every 4 (instead of 1) boosting stages. - feature_name (list of strings or 'auto', optional (default='auto')) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default='auto')) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - callbacks (list of callback functions or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
Returns: self – Returns self.
Return type: object
Note
Custom eval function expects a callable with following signatures:
func(y_true, y_pred)
,func(y_true, y_pred, weight)
orfunc(y_true, y_pred, weight, group)
and returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better):- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- weight : array-like of shape = [n_samples]
- The weight of samples.
- group : array-like
- Group/query data, used for ranking task.
- eval_name : string
- The name of evaluation.
- eval_result : float
- The eval result.
- is_bigger_better : bool
- Is eval result bigger better, e.g. AUC is bigger_better.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep (bool, optional (default=True)) – If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params – Parameter names mapped to their values. Return type: dict
-
n_features_
Get the number of features of fitted model.
-
objective_
Get the concrete objective used while fitting this model.
-
predict
(X, raw_score=False, num_iteration=None, pred_leaf=False, pred_contrib=False, **kwargs)[source]¶ Return the predicted value for each sample.
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input features matrix.
- raw_score (bool, optional (default=False)) – Whether to predict raw scores.
- num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
- pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
- pred_contrib (bool, optional (default=False)) –
Whether to predict feature contributions.
Note
If you want to get more explanation for your model’s predictions using SHAP values like SHAP interaction values, you can install shap package (https://github.com/slundberg/shap).
- **kwargs – Other parameters for the prediction.
Returns: - predicted_result (array-like of shape = [n_samples] or shape = [n_samples, n_classes]) – The predicted values.
- X_leaves (array-like of shape = [n_samples, n_trees] or shape [n_samples, n_trees * n_classes]) – If
pred_leaf=True
, the predicted leaf every tree for each sample. - X_SHAP_values (array-like of shape = [n_samples, n_features + 1] or shape [n_samples, (n_features + 1) * n_classes]) – If
pred_contrib=True
, the each feature contributions for each sample.
-
class
lightgbm.
LGBMClassifier
(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, importance_type='split', **kwargs)[source]¶ Bases:
lightgbm.sklearn.LGBMModel
,object
LightGBM classifier.
Construct a gradient boosting model.
Parameters: - boosting_type (string, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
- num_leaves (int, optional (default=31)) – Maximum tree leaves for base learners.
- max_depth (int, optional (default=-1)) – Maximum tree depth for base learners, -1 means no limit.
- learning_rate (float, optional (default=0.1)) – Boosting learning rate.
You can use
callbacks
parameter offit
method to shrink/adapt learning rate in training usingreset_parameter
callback. Note, that this will ignore thelearning_rate
argument in training. - n_estimators (int, optional (default=100)) – Number of boosted trees to fit.
- subsample_for_bin (int, optional (default=200000)) – Number of samples for constructing bins.
- objective (string, callable or None, optional (default=None)) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
- class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form
{class_label: weight}
. Use this parameter only for multi-class classification task; for binary classification task you may useis_unbalance
orscale_pos_weight
parameters. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y))
. If None, all classes are supposed to have weight one. Note, that these weights will be multiplied withsample_weight
(passed through thefit
method) ifsample_weight
is specified. - min_split_gain (float, optional (default=0.)) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
- min_child_weight (float, optional (default=1e-3)) – Minimum sum of instance weight (hessian) needed in a child (leaf).
- min_child_samples (int, optional (default=20)) – Minimum number of data needed in a child (leaf).
- subsample (float, optional (default=1.)) – Subsample ratio of the training instance.
- subsample_freq (int, optional (default=0)) – Frequence of subsample, <=0 means no enable.
- colsample_bytree (float, optional (default=1.)) – Subsample ratio of columns when constructing each tree.
- reg_alpha (float, optional (default=0.)) – L1 regularization term on weights.
- reg_lambda (float, optional (default=0.)) – L2 regularization term on weights.
- random_state (int or None, optional (default=None)) – Random number seed. If None, default seeds in C++ code will be used.
- n_jobs (int, optional (default=-1)) – Number of parallel threads.
- silent (bool, optional (default=True)) – Whether to print messages while running boosting.
- importance_type (string, optional (default='split')) – The type of feature importance to be filled into
feature_importances_
. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature. - **kwargs –
Other parameters for the model. Check http://lightgbm.readthedocs.io/en/latest/Parameters.html for more parameters.
Note
**kwargs is not supported in sklearn, it may cause unexpected issues.
-
n_features_
¶ The number of features of fitted model.
Type: int
-
classes_
¶ The class label array (only for classification problem).
Type: array of shape = [n_classes]
-
n_classes_
¶ The number of classes (only for classification problem).
Type: int
-
best_score_
¶ The best score of fitted model.
Type: dict or None
-
best_iteration_
¶ The best iteration of fitted model if
early_stopping_rounds
has been specified.Type: int or None
-
objective_
¶ The concrete objective used while fitting this model.
Type: string or callable
-
evals_result_
¶ The evaluation results if
early_stopping_rounds
has been specified.Type: dict or None
-
feature_importances_
¶ The feature importances (the higher, the more important the feature).
Type: array of shape = [n_features]
Note
A custom objective function can be provided for the
objective
parameter. In this case, it should have the signatureobjective(y_true, y_pred) -> grad, hess
orobjective(y_true, y_pred, group) -> grad, hess
:- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- group : array-like
- Group/query data, used for ranking task.
- grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the gradient for each sample point.
- hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the second derivative for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well.
-
best_iteration_
Get the best iteration of fitted model.
-
best_score_
Get the best score of fitted model.
-
booster_
Get the underlying lightgbm Booster of this model.
-
classes_
Get the class label array.
-
evals_result_
Get the evaluation results.
-
feature_importances_
Get feature importances.
Note
Feature importance in sklearn interface used to normalize to 1, it’s deprecated after 2.0.4 and is the same as Booster.feature_importance() now.
importance_type
attribute is passed to the function to configure the type of importance values to be extracted.
-
fit
(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_class_weight=None, eval_init_score=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)[source]¶ Build a gradient boosting model from the training set (X, y).
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input feature matrix.
- y (array-like of shape = [n_samples]) – The target values (class labels in classification, real numbers in regression).
- sample_weight (array-like of shape = [n_samples] or None, optional (default=None)) – Weights of training data.
- init_score (array-like of shape = [n_samples] or None, optional (default=None)) – Init score of training data.
- group (array-like or None, optional (default=None)) – Group data of training data.
- eval_set (list or None, optional (default=None)) – A list of (X, y) tuple pairs to use as validation sets.
- eval_names (list of strings or None, optional (default=None)) – Names of eval_set.
- eval_sample_weight (list of arrays or None, optional (default=None)) – Weights of eval data.
- eval_class_weight (list or None, optional (default=None)) – Class weights of eval data.
- eval_init_score (list of arrays or None, optional (default=None)) – Init score of eval data.
- eval_group (list of arrays or None, optional (default=None)) – Group data of eval data.
- eval_metric (string, list of strings, callable or None, optional (default=None)) – If string, it should be a built-in evaluation metric to use.
If callable, it should be a custom evaluation metric, see note below for more details.
In either case, the
metric
from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker. - early_stopping_rounds (int or None, optional (default=None)) – Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. - verbose (bool or int, optional (default=True)) –
Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every
verbose
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose
= 4 and at least one item ineval_set
, an evaluation metric is printed every 4 (instead of 1) boosting stages. - feature_name (list of strings or 'auto', optional (default='auto')) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default='auto')) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - callbacks (list of callback functions or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
Returns: self – Returns self.
Return type: object
Note
Custom eval function expects a callable with following signatures:
func(y_true, y_pred)
,func(y_true, y_pred, weight)
orfunc(y_true, y_pred, weight, group)
and returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better):- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- weight : array-like of shape = [n_samples]
- The weight of samples.
- group : array-like
- Group/query data, used for ranking task.
- eval_name : string
- The name of evaluation.
- eval_result : float
- The eval result.
- is_bigger_better : bool
- Is eval result bigger better, e.g. AUC is bigger_better.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: deep (bool, optional (default=True)) – If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params – Parameter names mapped to their values. Return type: dict
-
n_classes_
Get the number of classes.
-
n_features_
Get the number of features of fitted model.
-
objective_
Get the concrete objective used while fitting this model.
-
predict
(X, raw_score=False, num_iteration=None, pred_leaf=False, pred_contrib=False, **kwargs)[source]¶ Return the predicted value for each sample.
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input features matrix.
- raw_score (bool, optional (default=False)) – Whether to predict raw scores.
- num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
- pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
- pred_contrib (bool, optional (default=False)) –
Whether to predict feature contributions.
Note
If you want to get more explanation for your model’s predictions using SHAP values like SHAP interaction values, you can install shap package (https://github.com/slundberg/shap).
- **kwargs – Other parameters for the prediction.
Returns: - predicted_result (array-like of shape = [n_samples] or shape = [n_samples, n_classes]) – The predicted values.
- X_leaves (array-like of shape = [n_samples, n_trees] or shape [n_samples, n_trees * n_classes]) – If
pred_leaf=True
, the predicted leaf every tree for each sample. - X_SHAP_values (array-like of shape = [n_samples, n_features + 1] or shape [n_samples, (n_features + 1) * n_classes]) – If
pred_contrib=True
, the each feature contributions for each sample.
-
predict_proba
(X, raw_score=False, num_iteration=None, pred_leaf=False, pred_contrib=False, **kwargs)[source]¶ Return the predicted probability for each class for each sample.
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input features matrix.
- raw_score (bool, optional (default=False)) – Whether to predict raw scores.
- num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
- pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
- pred_contrib (bool, optional (default=False)) –
Whether to predict feature contributions.
Note
If you want to get more explanation for your model’s predictions using SHAP values like SHAP interaction values, you can install shap package (https://github.com/slundberg/shap).
- **kwargs – Other parameters for the prediction.
Returns: - predicted_probability (array-like of shape = [n_samples, n_classes]) – The predicted probability for each class for each sample.
- X_leaves (array-like of shape = [n_samples, n_trees * n_classes]) – If
pred_leaf=True
, the predicted leaf every tree for each sample. - X_SHAP_values (array-like of shape = [n_samples, (n_features + 1) * n_classes]) – If
pred_contrib=True
, the each feature contributions for each sample.
-
set_params
(**params)¶ Set the parameters of this estimator.
Parameters: **params – Parameter names with their new values. Returns: self – Returns self. Return type: object
-
class
lightgbm.
LGBMRegressor
(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, importance_type='split', **kwargs)[source]¶ Bases:
lightgbm.sklearn.LGBMModel
,object
LightGBM regressor.
Construct a gradient boosting model.
Parameters: - boosting_type (string, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
- num_leaves (int, optional (default=31)) – Maximum tree leaves for base learners.
- max_depth (int, optional (default=-1)) – Maximum tree depth for base learners, -1 means no limit.
- learning_rate (float, optional (default=0.1)) – Boosting learning rate.
You can use
callbacks
parameter offit
method to shrink/adapt learning rate in training usingreset_parameter
callback. Note, that this will ignore thelearning_rate
argument in training. - n_estimators (int, optional (default=100)) – Number of boosted trees to fit.
- subsample_for_bin (int, optional (default=200000)) – Number of samples for constructing bins.
- objective (string, callable or None, optional (default=None)) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
- class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form
{class_label: weight}
. Use this parameter only for multi-class classification task; for binary classification task you may useis_unbalance
orscale_pos_weight
parameters. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y))
. If None, all classes are supposed to have weight one. Note, that these weights will be multiplied withsample_weight
(passed through thefit
method) ifsample_weight
is specified. - min_split_gain (float, optional (default=0.)) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
- min_child_weight (float, optional (default=1e-3)) – Minimum sum of instance weight (hessian) needed in a child (leaf).
- min_child_samples (int, optional (default=20)) – Minimum number of data needed in a child (leaf).
- subsample (float, optional (default=1.)) – Subsample ratio of the training instance.
- subsample_freq (int, optional (default=0)) – Frequence of subsample, <=0 means no enable.
- colsample_bytree (float, optional (default=1.)) – Subsample ratio of columns when constructing each tree.
- reg_alpha (float, optional (default=0.)) – L1 regularization term on weights.
- reg_lambda (float, optional (default=0.)) – L2 regularization term on weights.
- random_state (int or None, optional (default=None)) – Random number seed. If None, default seeds in C++ code will be used.
- n_jobs (int, optional (default=-1)) – Number of parallel threads.
- silent (bool, optional (default=True)) – Whether to print messages while running boosting.
- importance_type (string, optional (default='split')) – The type of feature importance to be filled into
feature_importances_
. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature. - **kwargs –
Other parameters for the model. Check http://lightgbm.readthedocs.io/en/latest/Parameters.html for more parameters.
Note
**kwargs is not supported in sklearn, it may cause unexpected issues.
-
n_features_
¶ The number of features of fitted model.
Type: int
-
classes_
¶ The class label array (only for classification problem).
Type: array of shape = [n_classes]
-
n_classes_
¶ The number of classes (only for classification problem).
Type: int
-
best_score_
¶ The best score of fitted model.
Type: dict or None
-
best_iteration_
¶ The best iteration of fitted model if
early_stopping_rounds
has been specified.Type: int or None
-
objective_
¶ The concrete objective used while fitting this model.
Type: string or callable
-
evals_result_
¶ The evaluation results if
early_stopping_rounds
has been specified.Type: dict or None
-
feature_importances_
¶ The feature importances (the higher, the more important the feature).
Type: array of shape = [n_features]
Note
A custom objective function can be provided for the
objective
parameter. In this case, it should have the signatureobjective(y_true, y_pred) -> grad, hess
orobjective(y_true, y_pred, group) -> grad, hess
:- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- group : array-like
- Group/query data, used for ranking task.
- grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the gradient for each sample point.
- hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the second derivative for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well.
-
best_iteration_
Get the best iteration of fitted model.
-
best_score_
Get the best score of fitted model.
-
booster_
Get the underlying lightgbm Booster of this model.
-
evals_result_
Get the evaluation results.
-
feature_importances_
Get feature importances.
Note
Feature importance in sklearn interface used to normalize to 1, it’s deprecated after 2.0.4 and is the same as Booster.feature_importance() now.
importance_type
attribute is passed to the function to configure the type of importance values to be extracted.
-
fit
(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)[source]¶ Build a gradient boosting model from the training set (X, y).
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input feature matrix.
- y (array-like of shape = [n_samples]) – The target values (class labels in classification, real numbers in regression).
- sample_weight (array-like of shape = [n_samples] or None, optional (default=None)) – Weights of training data.
- init_score (array-like of shape = [n_samples] or None, optional (default=None)) – Init score of training data.
- group (array-like or None, optional (default=None)) – Group data of training data.
- eval_set (list or None, optional (default=None)) – A list of (X, y) tuple pairs to use as validation sets.
- eval_names (list of strings or None, optional (default=None)) – Names of eval_set.
- eval_sample_weight (list of arrays or None, optional (default=None)) – Weights of eval data.
- eval_init_score (list of arrays or None, optional (default=None)) – Init score of eval data.
- eval_group (list of arrays or None, optional (default=None)) – Group data of eval data.
- eval_metric (string, list of strings, callable or None, optional (default=None)) – If string, it should be a built-in evaluation metric to use.
If callable, it should be a custom evaluation metric, see note below for more details.
In either case, the
metric
from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker. - early_stopping_rounds (int or None, optional (default=None)) – Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. - verbose (bool or int, optional (default=True)) –
Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every
verbose
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose
= 4 and at least one item ineval_set
, an evaluation metric is printed every 4 (instead of 1) boosting stages. - feature_name (list of strings or 'auto', optional (default='auto')) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default='auto')) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - callbacks (list of callback functions or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
Returns: self – Returns self.
Return type: object
Note
Custom eval function expects a callable with following signatures:
func(y_true, y_pred)
,func(y_true, y_pred, weight)
orfunc(y_true, y_pred, weight, group)
and returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better):- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- weight : array-like of shape = [n_samples]
- The weight of samples.
- group : array-like
- Group/query data, used for ranking task.
- eval_name : string
- The name of evaluation.
- eval_result : float
- The eval result.
- is_bigger_better : bool
- Is eval result bigger better, e.g. AUC is bigger_better.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: deep (bool, optional (default=True)) – If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params – Parameter names mapped to their values. Return type: dict
-
n_features_
Get the number of features of fitted model.
-
objective_
Get the concrete objective used while fitting this model.
-
predict
(X, raw_score=False, num_iteration=None, pred_leaf=False, pred_contrib=False, **kwargs)¶ Return the predicted value for each sample.
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input features matrix.
- raw_score (bool, optional (default=False)) – Whether to predict raw scores.
- num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
- pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
- pred_contrib (bool, optional (default=False)) –
Whether to predict feature contributions.
Note
If you want to get more explanation for your model’s predictions using SHAP values like SHAP interaction values, you can install shap package (https://github.com/slundberg/shap).
- **kwargs – Other parameters for the prediction.
Returns: - predicted_result (array-like of shape = [n_samples] or shape = [n_samples, n_classes]) – The predicted values.
- X_leaves (array-like of shape = [n_samples, n_trees] or shape [n_samples, n_trees * n_classes]) – If
pred_leaf=True
, the predicted leaf every tree for each sample. - X_SHAP_values (array-like of shape = [n_samples, n_features + 1] or shape [n_samples, (n_features + 1) * n_classes]) – If
pred_contrib=True
, the each feature contributions for each sample.
-
set_params
(**params)¶ Set the parameters of this estimator.
Parameters: **params – Parameter names with their new values. Returns: self – Returns self. Return type: object
-
class
lightgbm.
LGBMRanker
(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, importance_type='split', **kwargs)[source]¶ Bases:
lightgbm.sklearn.LGBMModel
LightGBM ranker.
Construct a gradient boosting model.
Parameters: - boosting_type (string, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
- num_leaves (int, optional (default=31)) – Maximum tree leaves for base learners.
- max_depth (int, optional (default=-1)) – Maximum tree depth for base learners, -1 means no limit.
- learning_rate (float, optional (default=0.1)) – Boosting learning rate.
You can use
callbacks
parameter offit
method to shrink/adapt learning rate in training usingreset_parameter
callback. Note, that this will ignore thelearning_rate
argument in training. - n_estimators (int, optional (default=100)) – Number of boosted trees to fit.
- subsample_for_bin (int, optional (default=200000)) – Number of samples for constructing bins.
- objective (string, callable or None, optional (default=None)) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
- class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form
{class_label: weight}
. Use this parameter only for multi-class classification task; for binary classification task you may useis_unbalance
orscale_pos_weight
parameters. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y))
. If None, all classes are supposed to have weight one. Note, that these weights will be multiplied withsample_weight
(passed through thefit
method) ifsample_weight
is specified. - min_split_gain (float, optional (default=0.)) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
- min_child_weight (float, optional (default=1e-3)) – Minimum sum of instance weight (hessian) needed in a child (leaf).
- min_child_samples (int, optional (default=20)) – Minimum number of data needed in a child (leaf).
- subsample (float, optional (default=1.)) – Subsample ratio of the training instance.
- subsample_freq (int, optional (default=0)) – Frequence of subsample, <=0 means no enable.
- colsample_bytree (float, optional (default=1.)) – Subsample ratio of columns when constructing each tree.
- reg_alpha (float, optional (default=0.)) – L1 regularization term on weights.
- reg_lambda (float, optional (default=0.)) – L2 regularization term on weights.
- random_state (int or None, optional (default=None)) – Random number seed. If None, default seeds in C++ code will be used.
- n_jobs (int, optional (default=-1)) – Number of parallel threads.
- silent (bool, optional (default=True)) – Whether to print messages while running boosting.
- importance_type (string, optional (default='split')) – The type of feature importance to be filled into
feature_importances_
. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature. - **kwargs –
Other parameters for the model. Check http://lightgbm.readthedocs.io/en/latest/Parameters.html for more parameters.
Note
**kwargs is not supported in sklearn, it may cause unexpected issues.
-
n_features_
¶ The number of features of fitted model.
Type: int
-
classes_
¶ The class label array (only for classification problem).
Type: array of shape = [n_classes]
-
n_classes_
¶ The number of classes (only for classification problem).
Type: int
-
best_score_
¶ The best score of fitted model.
Type: dict or None
-
best_iteration_
¶ The best iteration of fitted model if
early_stopping_rounds
has been specified.Type: int or None
-
objective_
¶ The concrete objective used while fitting this model.
Type: string or callable
-
evals_result_
¶ The evaluation results if
early_stopping_rounds
has been specified.Type: dict or None
-
feature_importances_
¶ The feature importances (the higher, the more important the feature).
Type: array of shape = [n_features]
Note
A custom objective function can be provided for the
objective
parameter. In this case, it should have the signatureobjective(y_true, y_pred) -> grad, hess
orobjective(y_true, y_pred, group) -> grad, hess
:- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- group : array-like
- Group/query data, used for ranking task.
- grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the gradient for each sample point.
- hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The value of the second derivative for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well.
-
best_iteration_
Get the best iteration of fitted model.
-
best_score_
Get the best score of fitted model.
-
booster_
Get the underlying lightgbm Booster of this model.
-
evals_result_
Get the evaluation results.
-
feature_importances_
Get feature importances.
Note
Feature importance in sklearn interface used to normalize to 1, it’s deprecated after 2.0.4 and is the same as Booster.feature_importance() now.
importance_type
attribute is passed to the function to configure the type of importance values to be extracted.
-
fit
(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, eval_at=[1], early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)[source]¶ Build a gradient boosting model from the training set (X, y).
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input feature matrix.
- y (array-like of shape = [n_samples]) – The target values (class labels in classification, real numbers in regression).
- sample_weight (array-like of shape = [n_samples] or None, optional (default=None)) – Weights of training data.
- init_score (array-like of shape = [n_samples] or None, optional (default=None)) – Init score of training data.
- group (array-like or None, optional (default=None)) – Group data of training data.
- eval_set (list or None, optional (default=None)) – A list of (X, y) tuple pairs to use as validation sets.
- eval_names (list of strings or None, optional (default=None)) – Names of eval_set.
- eval_sample_weight (list of arrays or None, optional (default=None)) – Weights of eval data.
- eval_init_score (list of arrays or None, optional (default=None)) – Init score of eval data.
- eval_group (list of arrays or None, optional (default=None)) – Group data of eval data.
- eval_metric (string, list of strings, callable or None, optional (default=None)) – If string, it should be a built-in evaluation metric to use.
If callable, it should be a custom evaluation metric, see note below for more details.
In either case, the
metric
from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker. - eval_at (list of int, optional (default=[1])) – The evaluation positions of the specified metric.
- early_stopping_rounds (int or None, optional (default=None)) – Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. - verbose (bool or int, optional (default=True)) –
Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every
verbose
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose
= 4 and at least one item ineval_set
, an evaluation metric is printed every 4 (instead of 1) boosting stages. - feature_name (list of strings or 'auto', optional (default='auto')) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature (list of strings or int, or 'auto', optional (default='auto')) – Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. - callbacks (list of callback functions or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
Returns: self – Returns self.
Return type: object
Note
Custom eval function expects a callable with following signatures:
func(y_true, y_pred)
,func(y_true, y_pred, weight)
orfunc(y_true, y_pred, weight, group)
and returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better):- y_true : array-like of shape = [n_samples]
- The target values.
- y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
- The predicted values.
- weight : array-like of shape = [n_samples]
- The weight of samples.
- group : array-like
- Group/query data, used for ranking task.
- eval_name : string
- The name of evaluation.
- eval_result : float
- The eval result.
- is_bigger_better : bool
- Is eval result bigger better, e.g. AUC is bigger_better.
For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: deep (bool, optional (default=True)) – If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params – Parameter names mapped to their values. Return type: dict
-
n_features_
Get the number of features of fitted model.
-
objective_
Get the concrete objective used while fitting this model.
-
predict
(X, raw_score=False, num_iteration=None, pred_leaf=False, pred_contrib=False, **kwargs)¶ Return the predicted value for each sample.
Parameters: - X (array-like or sparse matrix of shape = [n_samples, n_features]) – Input features matrix.
- raw_score (bool, optional (default=False)) – Whether to predict raw scores.
- num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
- pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
- pred_contrib (bool, optional (default=False)) –
Whether to predict feature contributions.
Note
If you want to get more explanation for your model’s predictions using SHAP values like SHAP interaction values, you can install shap package (https://github.com/slundberg/shap).
- **kwargs – Other parameters for the prediction.
Returns: - predicted_result (array-like of shape = [n_samples] or shape = [n_samples, n_classes]) – The predicted values.
- X_leaves (array-like of shape = [n_samples, n_trees] or shape [n_samples, n_trees * n_classes]) – If
pred_leaf=True
, the predicted leaf every tree for each sample. - X_SHAP_values (array-like of shape = [n_samples, n_features + 1] or shape [n_samples, (n_features + 1) * n_classes]) – If
pred_contrib=True
, the each feature contributions for each sample.
-
set_params
(**params)¶ Set the parameters of this estimator.
Parameters: **params – Parameter names with their new values. Returns: self – Returns self. Return type: object
Callbacks¶
-
lightgbm.
early_stopping
(stopping_rounds, verbose=True)[source]¶ Create a callback that activates early stopping.
Note
Activates early stopping. The model will train until the validation score stops improving. Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway.Parameters: - stopping_rounds (int) – The possible number of rounds without the trend occurrence.
- verbose (bool, optional (default=True)) – Whether to print message with early stopping information.
Returns: callback – The callback that activates early stopping.
Return type: function
-
lightgbm.
print_evaluation
(period=1, show_stdv=True)[source]¶ Create a callback that prints the evaluation results.
Parameters: - period (int, optional (default=1)) – The period to print the evaluation results.
- show_stdv (bool, optional (default=True)) – Whether to show stdv (if provided).
Returns: callback – The callback that prints the evaluation results every
period
iteration(s).Return type: function
-
lightgbm.
record_evaluation
(eval_result)[source]¶ Create a callback that records the evaluation history into
eval_result
.Parameters: eval_result (dict) – A dictionary to store the evaluation results. Returns: callback – The callback that records the evaluation history into the passed dictionary. Return type: function
-
lightgbm.
reset_parameter
(**kwargs)[source]¶ Create a callback that resets the parameter after the first iteration.
Note
The initial parameter will still take in-effect on first iteration.
Parameters: **kwargs (value should be list or function) – List of parameters for each boosting round or a customized function that calculates the parameter in terms of current number of round (e.g. yields learning rate decay). If list lst, parameter = lst[current_round]. If function func, parameter = func(current_round). Returns: callback – The callback that resets the parameter after the first iteration. Return type: function
Plotting¶
-
lightgbm.
plot_importance
(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='Feature importance', ylabel='Features', importance_type='split', max_num_features=None, ignore_zero=True, figsize=None, grid=True, precision=None, **kwargs)[source]¶ Plot model’s feature importances.
Parameters: - booster (Booster or LGBMModel) – Booster or LGBMModel instance which feature importance should be plotted.
- ax (matplotlib.axes.Axes or None, optional (default=None)) – Target axes instance. If None, new figure and axes will be created.
- height (float, optional (default=0.2)) – Bar height, passed to
ax.barh()
. - xlim (tuple of 2 elements or None, optional (default=None)) – Tuple passed to
ax.xlim()
. - ylim (tuple of 2 elements or None, optional (default=None)) – Tuple passed to
ax.ylim()
. - title (string or None, optional (default="Feature importance")) – Axes title. If None, title is disabled.
- xlabel (string or None, optional (default="Feature importance")) – X-axis title label. If None, title is disabled.
- ylabel (string or None, optional (default="Features")) – Y-axis title label. If None, title is disabled.
- importance_type (string, optional (default="split")) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.
- max_num_features (int or None, optional (default=None)) – Max number of top features displayed on plot. If None or <1, all features will be displayed.
- ignore_zero (bool, optional (default=True)) – Whether to ignore features with zero importance.
- figsize (tuple of 2 elements or None, optional (default=None)) – Figure size.
- grid (bool, optional (default=True)) – Whether to add a grid for axes.
- precision (int or None, optional (default=None)) – Used to restrict the display of floating point values to a certain precision.
- **kwargs – Other parameters passed to
ax.barh()
.
Returns: ax – The plot with model’s feature importances.
Return type: matplotlib.axes.Axes
-
lightgbm.
plot_metric
(booster, metric=None, dataset_names=None, ax=None, xlim=None, ylim=None, title='Metric during training', xlabel='Iterations', ylabel='auto', figsize=None, grid=True)[source]¶ Plot one metric during training.
Parameters: - booster (dict or LGBMModel) – Dictionary returned from
lightgbm.train()
or LGBMModel instance. - metric (string or None, optional (default=None)) – The metric name to plot. Only one metric supported because different metrics have various scales. If None, first metric picked from dictionary (according to hashcode).
- dataset_names (list of strings or None, optional (default=None)) – List of the dataset names which are used to calculate metric to plot. If None, all datasets are used.
- ax (matplotlib.axes.Axes or None, optional (default=None)) – Target axes instance. If None, new figure and axes will be created.
- xlim (tuple of 2 elements or None, optional (default=None)) – Tuple passed to
ax.xlim()
. - ylim (tuple of 2 elements or None, optional (default=None)) – Tuple passed to
ax.ylim()
. - title (string or None, optional (default="Metric during training")) – Axes title. If None, title is disabled.
- xlabel (string or None, optional (default="Iterations")) – X-axis title label. If None, title is disabled.
- ylabel (string or None, optional (default="auto")) – Y-axis title label. If ‘auto’, metric name is used. If None, title is disabled.
- figsize (tuple of 2 elements or None, optional (default=None)) – Figure size.
- grid (bool, optional (default=True)) – Whether to add a grid for axes.
Returns: ax – The plot with metric’s history over the training.
Return type: matplotlib.axes.Axes
- booster (dict or LGBMModel) – Dictionary returned from
-
lightgbm.
plot_tree
(booster, ax=None, tree_index=0, figsize=None, old_graph_attr=None, old_node_attr=None, old_edge_attr=None, show_info=None, precision=None, **kwargs)[source]¶ Plot specified tree.
Note
It is preferable to use
create_tree_digraph()
because of its lossless quality and returned objects can be also rendered and displayed directly inside a Jupyter notebook.Parameters: - booster (Booster or LGBMModel) – Booster or LGBMModel instance to be plotted.
- ax (matplotlib.axes.Axes or None, optional (default=None)) – Target axes instance. If None, new figure and axes will be created.
- tree_index (int, optional (default=0)) – The index of a target tree to plot.
- figsize (tuple of 2 elements or None, optional (default=None)) – Figure size.
- show_info (list of strings or None, optional (default=None)) – What information should be shown in nodes. Possible values of list items: ‘split_gain’, ‘internal_value’, ‘internal_count’, ‘leaf_count’.
- precision (int or None, optional (default=None)) – Used to restrict the display of floating point values to a certain precision.
- **kwargs – Other parameters passed to
Digraph
constructor. Check https://graphviz.readthedocs.io/en/stable/api.html#digraph for the full list of supported parameters.
Returns: ax – The plot with single tree.
Return type: matplotlib.axes.Axes
-
lightgbm.
create_tree_digraph
(booster, tree_index=0, show_info=None, precision=None, old_name=None, old_comment=None, old_filename=None, old_directory=None, old_format=None, old_engine=None, old_encoding=None, old_graph_attr=None, old_node_attr=None, old_edge_attr=None, old_body=None, old_strict=False, **kwargs)[source]¶ Create a digraph representation of specified tree.
Note
For more information please visit https://graphviz.readthedocs.io/en/stable/api.html#digraph.
Parameters: - booster (Booster or LGBMModel) – Booster or LGBMModel instance to be converted.
- tree_index (int, optional (default=0)) – The index of a target tree to convert.
- show_info (list of strings or None, optional (default=None)) – What information should be shown in nodes. Possible values of list items: ‘split_gain’, ‘internal_value’, ‘internal_count’, ‘leaf_count’.
- precision (int or None, optional (default=None)) – Used to restrict the display of floating point values to a certain precision.
- **kwargs – Other parameters passed to
Digraph
constructor. Check https://graphviz.readthedocs.io/en/stable/api.html#digraph for the full list of supported parameters.
Returns: graph – The digraph representation of specified tree.
Return type: graphviz.Digraph