Introduction¶
seqlearn extends the scikit-learn machine learning library to deal with sequence classification: sequences of observations that must be individually labeled, but where the order in which they appear matters.
seqlearn mimicks the basic scikit-learn fit
/predict
API
and tries to stay compatible with scikit-learn’s data formats,
but adds an argument to the scikit-learn methods that encodes the structure
of the input. This argument is called lengths
and should be an array of integers denoting the respective lengths
of sequences in (X, y)
.
For example, if X
and y
both have length (shape[0]
) of 10, then
lengths=[6, 4]
encodes the information that (X[:6], y[:6])
and
(X[6:10], y[6:10])
are both coherent sequences.
This encoding of sequence information allows for a fast implementation
using NumPy’s vectorized operations.