Contributing¶

rules of thumb

git clone git://github.com/scikit-learn/scikit-learn.git

git clone git@github.com:scikit-learn/scikit-learn.git

python setup.py build_ext --inplace

python setup.py develop

python setup.py build_ext --inplace

$ git clone git@github.com:YourLogin/scikit-learn.git

$ git checkout -b my-feature

$ git add modified_files
$ git commit

$ git push -u origin my-feature

$ git remote add upstream https://github.com/scikit-learn/scikit-learn.git

$ make

$ pip install nose coverage
$ nosetests --with-coverage path/to/tests_for_package

$ pip install pyflakes
$ pyflakes path/to/module.py

$ pip install pep8
$ pep8 path/to/module.py

$ pip install autopep8
$ autopep8 path/to/pep8.py

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)

See also
--------
SelectKBest : Select features based on the k highest scores.
SelectFpr : Select features based on a false positive rate test.

from sklearn.utils import check_array, check_random_state

def choose_random_sample(X, random_state=0):
    """
    Choose a random point from X

    Parameters
    ----------
    X : array-like, shape = (n_samples, n_features)
        array representing the data
    random_state : RandomState or an int seed (0 by default)
        A random number generator instance to define the state of the
        random permutations generator.

    Returns
    -------
    x : numpy array, shape = (n_features,)
        A random point selected from X
    """
    X = check_array(X)
    random_state = check_random_state(random_state)
    i = random_state.randint(X.shape[0])
    return X[i]

class GaussianNoise(BaseEstimator, TransformerMixin):
    """This estimator ignores its input and returns random Gaussian noise.

    It also does not adhere to all scikit-learn conventions,
    but showcases how to handle randomness.
    """

    def __init__(self, n_components=100, random_state=None):
        self.random_state = random_state

    # the arguments are ignored anyway, so we make them optional
    def fit(self, X=None, y=None):
        self.random_state_ = check_random_state(self.random_state)

    def transform(self, X):
        n_samples = X.shape[0]
        return self.random_state_.randn(n_samples, n_components)

from ..utils import deprecated

def zero_one_loss(y_true, y_pred, normalize=True):
    # actual implementation
    pass

@deprecated("Function 'zero_one' was renamed to 'zero_one_loss' "
            "in version 0.13 and will be removed in release 0.15. "
            "Default behavior is changed from 'normalize=False' to "
            "'normalize=True'")
def zero_one(y_true, y_pred, normalize=False):
    return zero_one_loss(y_true, y_pred, normalize)

@property
@deprecated("Attribute labels_ was deprecated in version 0.13 and "
            "will be removed in 0.15. Use 'classes_' instead")
def labels_(self):
    return self.classes_

import warnings

def example_function(n_clusters=8, k=None):
    if k is not None:
        warnings.warn("'k' was renamed to n_clusters in version 0.13 and "
                      "will be removed in 0.15.", DeprecationWarning)
        n_clusters = k

estimator = obj.fit(data, targets)

estimator = obj.fit(data)

prediction = obj.predict(data)

probability = obj.predict_proba(data)

new_data = obj.transform(data)

new_data = obj.fit_transform(data)

score = obj.score(data)

estimator.fit(X, y)

clf2 = SVC(C=2.3)
clf3 = SVC([[1, 2], [2, 3]], [-1, 1]) # WRONG!

def __init__(self, param1=1, param2=2):
    self.param1 = param1
    self.param2 = param2

def __init__(self, param1=1, param2=2, param3=3):
    # WRONG: parameters should not be modified
    if param1 > 1:
        param2 += 1
    self.param1 = param1
    # WRONG: the object's attributes should have exactly the name of
    # the argument in the constructor
    self.param3 = param2

y_predicted = SVC(C=100).fit(X_train, y_train).predict(X_test)

>>> from sklearn.utils.estimator_checks import check_estimator
>>> from sklearn.svm import LinearSVC
>>> check_estimator(LinearSVC)  # passes

>>> import numpy as np
>>> from sklearn.base import BaseEstimator, ClassifierMixin
>>> from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
>>> from sklearn.utils.multiclass import unique_labels
>>> from sklearn.metrics import euclidean_distances
>>> class TemplateClassifier(BaseEstimator, ClassifierMixin):
...
...     def __init__(self, demo_param='demo'):
...         self.demo_param = demo_param
...
...     def fit(self, X, y):
...
...         # Check that X and y have correct shape
...         X, y = check_X_y(X, y)
...         # Store the classes seen during fit
...         self.classes_ = unique_labels(y)
...
...         self.X_ = X
...         self.y_ = y
...         # Return the classifier
...         return self
...
...     def predict(self, X):
...
...         # Check is fit had been called
...         check_is_fitted(self, ['X_', 'y_'])
...
...         # Input validation
...         X = check_array(X)
...
...         closest = np.argmin(euclidean_distances(X, self.X_), axis=1)
...         return self.y_[closest]

def get_params(self, deep=True):
    # suppose this estimator has parameters "alpha" and "recursive"
    return {"alpha": self.alpha, "recursive": self.recursive}

def set_params(self, **parameters):
    for parameter, value in parameters.items():
        self.setattr(parameter, value)
    return self

self.classes_, y = np.unique(y, return_inverse=True)

def predict(self, X):
    D = self.decision_function(X)
    return self.classes_[np.argmax(D, axis=1)]

Cleanup / Enhancement:
Bug / Crash:	Something is happening that clearly shouldn’t happen. Wrong results as well as unexpected errors from estimators go here.
	Improving performance, usability, consistency.
Documentation:	Missing, incorrect or sub-standard documentations and examples.
New Feature:	Feature requests and pull requests implementing a new feature.

Needs Contributor:
Easy:	This issue can be tackled by anyone, no experience needed. Ask for help if the formulation is unclear.
Moderate:	Might need some knowledge of machine learning or the package, but is still approachable for someone new to the project.
	This tag marks an issue which currently lacks a contributor or a PR that needs another contributor to take over the work. These issues can range in difficulty, and may not be approachable for new contributors. Note that not all issues which need contributors will have this tag.

Estimator:	The base object, implements a `fit` method to learn from data, either: estimator = obj.fit(data, targets) or: estimator = obj.fit(data)
Predictor:	For supervised learning, or some unsupervised problems, implements: prediction = obj.predict(data) Classification algorithms usually also offer a way to quantify certainty of a prediction, either using `decision_function` or `predict_proba`: probability = obj.predict_proba(data)
Transformer:	For filtering or modifying the data, in a supervised or unsupervised way, implements: new_data = obj.transform(data) When fitting and transforming can be performed much more efficiently together than separately, implements: new_data = obj.fit_transform(data)
Model:	A model that can give a goodness of fit measure or a likelihood of unseen data, implements (higher is better): score = obj.score(data)

Parameters
X	array-like, with shape = [N, D], where N is the number of samples and D is the number of features.
y	array, with shape = [N], where N is the number of samples.
kwargs	optional data-dependent parameters.

Contributing¶

Submitting a bug report¶

Retrieving the latest code¶

Contributing code¶

How to contribute¶

Contributing pull requests¶

Filing Bugs¶

Issues for New Contributors¶

Documentation¶

Testing and improving test coverage¶

Developers web site¶

Issue Tracker Tags¶

Other ways to contribute¶

Coding guidelines¶

Input validation¶

Random Numbers¶

Deprecation¶

Python 3.x support¶

APIs of scikit-learn objects¶

Different objects¶

Estimators¶

Instantiation¶

Fitting¶

Estimated Attributes¶

Optional Arguments¶

Rolling your own estimator¶

get_params and set_params¶

Parameters and init¶

Cloning¶

Pipeline compatibility¶

Estimator types¶

Working notes¶

Specific models¶