Parameters Tuning
=================

This page contains parameters tuning guides for different scenarios.

**List of other helpful links**

-  `Parameters <./Parameters.rst>`__
-  `Python API <./Python-API.rst>`__

Tune Parameters for the Leaf-wise (Best-first) Tree
---------------------------------------------------

LightGBM uses the `leaf-wise <./Features.rst#leaf-wise-best-first-tree-growth>`__ tree growth algorithm, while many other popular tools use depth-wise tree growth.
Compared with depth-wise growth, the leaf-wise algorithm can converge much faster.
However, the leaf-wise growth may be over-fitting if not used with the appropriate parameters.

To get good results using a leaf-wise tree, these are some important parameters:

1. ``num_leaves``. This is the main parameter to control the complexity of the tree model.
   Theoretically, we can set ``num_leaves = 2^(max_depth)`` to obtain the same number of leaves as depth-wise tree.
   However, this simple conversion is not good in practice.
   The reason is that a leaf-wise tree is typically much deeper than a depth-wise tree for a fixed number of leaves. Unconstrained depth can induce over-fitting.
   Thus, when trying to tune the ``num_leaves``, we should let it be smaller than ``2^(max_depth)``.
   For example, when the ``max_depth=7`` the depth-wise tree can get good accuracy,
   but setting ``num_leaves`` to ``127`` may cause over-fitting, and setting it to ``70`` or ``80`` may get better accuracy than depth-wise.

2. ``min_data_in_leaf``. This is a very important parameter to prevent over-fitting in a leaf-wise tree.
   Its optimal value depends on the number of training samples and ``num_leaves``.
   Setting it to a large value can avoid growing too deep a tree, but may cause under-fitting.
   In practice, setting it to hundreds or thousands is enough for a large dataset.

3. ``max_depth``. You also can use ``max_depth`` to limit the tree depth explicitly.

For Faster Speed
----------------

-  Use bagging by setting ``bagging_fraction`` and ``bagging_freq``

-  Use feature sub-sampling by setting ``feature_fraction``

-  Use small ``max_bin``

-  Use ``save_binary`` to speed up data loading in future learning

-  Use parallel learning, refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__


For Better Accuracy
-------------------

-  Use large ``max_bin`` (may be slower)

-  Use small ``learning_rate`` with large ``num_iterations``

-  Use large ``num_leaves`` (may cause over-fitting)

-  Use bigger training data

-  Try ``dart``

Deal with Over-fitting
----------------------

-  Use small ``max_bin``

-  Use small ``num_leaves``

-  Use ``min_data_in_leaf`` and ``min_sum_hessian_in_leaf``

-  Use bagging by set ``bagging_fraction`` and ``bagging_freq``

-  Use feature sub-sampling by set ``feature_fraction``

-  Use bigger training data

-  Try ``lambda_l1``, ``lambda_l2`` and ``min_gain_to_split`` for regularization

-  Try ``max_depth`` to avoid growing deep tree