Dlib is principally a C++ library, however, you can use a number of its tools from python applications. This page documents the python API for working with these dlib tools. If you haven’t done so already, you should probably look at the python example programs first before consulting this reference. These example programs are little mini-tutorials for using dlib from python. They are listed on the left of the main dlib web page.
Classes¶
dlib.array
dlib.cca_outputs
dlib.cnn_face_detection_model_v1
dlib.correlation_tracker
dlib.drectangle
dlib.face_recognition_model_v1
dlib.fhog_object_detector
dlib.full_object_detection
dlib.full_object_detections
dlib.image_window
dlib.matrix
dlib.mmod_rectangle
dlib.mmod_rectangles
dlib.mmod_rectangless
dlib.pair
dlib.point
dlib.points
dlib.range
dlib.ranges
dlib.rangess
dlib.ranking_pair
dlib.ranking_pairs
dlib.rectangle
dlib.rectangles
dlib.rgb_pixel
dlib.segmenter_params
dlib.segmenter_test
dlib.segmenter_type
dlib.shape_predictor
dlib.shape_predictor_training_options
dlib.simple_object_detector
dlib.simple_object_detector_training_options
dlib.simple_test_results
dlib.sparse_ranking_pair
dlib.sparse_ranking_pairs
dlib.sparse_vector
dlib.sparse_vectors
dlib.sparse_vectorss
dlib.svm_c_trainer_histogram_intersection
dlib.svm_c_trainer_linear
dlib.svm_c_trainer_radial_basis
dlib.svm_c_trainer_sparse_histogram_intersection
dlib.svm_c_trainer_sparse_linear
dlib.svm_c_trainer_sparse_radial_basis
dlib.svm_rank_trainer
dlib.svm_rank_trainer_sparse
dlib.vector
dlib.vectors
dlib.vectorss
Functions¶
dlib.apply_cca_transform()
dlib.assignment_cost()
dlib.cca()
dlib.chinese_whispers_clustering()
dlib.cross_validate_ranking_trainer()
dlib.cross_validate_sequence_segmenter()
dlib.cross_validate_trainer()
dlib.cross_validate_trainer_threaded()
dlib.dot()
dlib.find_candidate_object_locations()
dlib.get_frontal_face_detector()
dlib.hit_enter_to_continue()
dlib.load_libsvm_formatted_data()
dlib.make_sparse_vector()
dlib.max_cost_assignment()
dlib.max_index_plus_one()
dlib.save_face_chip()
dlib.save_face_chips()
dlib.save_libsvm_formatted_data()
dlib.solve_structural_svm_problem()
dlib.test_binary_decision_function()
dlib.test_ranking_function()
dlib.test_regression_function()
dlib.test_sequence_segmenter()
dlib.test_shape_predictor()
dlib.test_simple_object_detector()
dlib.train_sequence_segmenter()
dlib.train_shape_predictor()
dlib.train_simple_object_detector()
Detailed API Listing¶
-
dlib.
apply_cca_transform
((matrix)m, (sparse_vector)v) → vector :¶ - requires
- max_index_plus_one(v) <= m.nr()
- ensures
- returns trans(m)*v (i.e. multiply m by the vector v and return the result)
-
class
dlib.
array
¶ This object represents a 1D array of floating point numbers. Moreover, it binds directly to the C++ type std::vector<double>.
-
append
((array)arg1, (object)arg2) → None¶
-
clear
((array)arg1) → None¶
-
extend
((array)arg1, (object)arg2) → None¶
-
resize
((array)arg1, (int)arg2) → None¶
-
-
dlib.
assignment_cost
((matrix)cost, (list)assignment) → float :¶ - requires
- cost.nr() == cost.nc() (i.e. the input must be a square matrix)
- for all valid i:
- 0 <= assignment[i] < cost.nr()
- ensures
Interprets cost as a cost assignment matrix. That is, cost[i][j] represents the cost of assigning i to j.
Interprets assignment as a particular set of assignments. That is, i is assigned to assignment[i].
returns the cost of the given assignment. That is, returns a number which is:
sum over i: cost[i][assignment[i]]
-
dlib.
cca
((sparse_vectors)L, (sparse_vectors)R, (int)num_correlations[, (int)extra_rank=5[, (int)q=2[, (float)regularization=0]]]) → cca_outputs :¶ - requires
- num_correlations > 0
- len(L) > 0
- len(R) > 0
- len(L) == len(R)
- regularization >= 0
- L and R must be properly sorted sparse vectors. This means they must list their elements in ascending index order and not contain duplicate index values. You can use make_sparse_vector() to ensure this is true.
- ensures
This function performs a canonical correlation analysis between the vectors in L and R. That is, it finds two transformation matrices, Ltrans and Rtrans, such that row vectors in the transformed matrices L*Ltrans and R*Rtrans are as correlated as possible (note that in this notation we interpret L as a matrix with the input vectors in its rows). Note also that this function tries to find transformations which produce num_correlations dimensional output vectors.
Note that you can easily apply the transformation to a vector using apply_cca_transform(). So for example, like this:
- apply_cca_transform(Ltrans, some_sparse_vector)
returns a structure containing the Ltrans and Rtrans transformation matrices as well as the estimated correlations between elements of the transformed vectors.
This function assumes the data vectors in L and R have already been centered (i.e. we assume the vectors have zero means). However, in many cases it is fine to use uncentered data with cca(). But if it is important for your problem then you should center your data before passing it to cca().
This function works with reduced rank approximations of the L and R matrices. This makes it fast when working with large matrices. In particular, we use the dlib::svd_fast() routine to find reduced rank representations of the input matrices by calling it as follows: svd_fast(L, U,D,V, num_correlations+extra_rank, q) and similarly for R. This means that you can use the extra_rank and q arguments to cca() to influence the accuracy of the reduced rank approximation. However, the default values should work fine for most problems.
The dimensions of the output vectors produced by L*#Ltrans or R*#Rtrans are ordered such that the dimensions with the highest correlations come first. That is, after applying the transforms produced by cca() to a set of vectors you will find that dimension 0 has the highest correlation, then dimension 1 has the next highest, and so on. This also means that the list of estimated correlations returned from cca() will always be listed in decreasing order.
This function performs the ridge regression version of Canonical Correlation Analysis when regularization is set to a value > 0. In particular, larger values indicate the solution should be more heavily regularized. This can be useful when the dimensionality of the data is larger than the number of samples.
A good discussion of CCA can be found in the paper “Canonical Correlation Analysis” by David Weenink. In particular, this function is implemented using equations 29 and 30 from his paper. We also use the idea of doing CCA on a reduced rank approximation of L and R as suggested by Paramveer S. Dhillon in his paper “Two Step CCA: A new spectral method for estimating vector models of words”.
-
dlib.
chinese_whispers_clustering
((list)descriptors, (float)threshold) → list :¶ Takes a list of descriptors and returns a list that contains a label for each descriptor. Clustering is done using dlib::chinese_whispers.
-
class
dlib.
cnn_face_detection_model_v1
¶ This object detects human faces in an image. The constructor loads the face detection model from a file. You can download a pre-trained model from http://dlib.net/files/mmod_human_face_detector.dat.bz2.
-
class
dlib.
correlation_tracker
¶ This is a tool for tracking moving objects in a video stream. You give it the bounding box of an object in the first frame and it attempts to track the object in the box from frame to frame. This tool is an implementation of the method described in the following paper:
Danelljan, Martin, et al. ‘Accurate scale estimation for robust visual tracking.’ Proceedings of the British Machine Vision Conference BMVC. 2014.-
get_position
((correlation_tracker)arg1) → drectangle :¶ returns the predicted position of the object under track.
-
start_track
((correlation_tracker)arg1, (object)image, (drectangle)bounding_box) → None :¶ - requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- bounding_box.is_empty() == false
- ensures
- This object will start tracking the thing inside the bounding box in the given image. That is, if you call update() with subsequent video frames then it will try to keep track of the position of the object inside bounding_box.
- #get_position() == bounding_box
- start_track( (correlation_tracker)arg1, (object)image, (rectangle)bounding_box) -> None :
- requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- bounding_box.is_empty() == false
- ensures
- This object will start tracking the thing inside the bounding box in the given image. That is, if you call update() with subsequent video frames then it will try to keep track of the position of the object inside bounding_box.
- #get_position() == bounding_box
-
update
((correlation_tracker)arg1, (object)image) → float :¶ - requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- get_position().is_empty() == false (i.e. you must have started tracking by calling start_track())
- ensures
- performs: return update(img, get_position())
- update( (correlation_tracker)arg1, (object)image, (drectangle)guess) -> float :
- requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- get_position().is_empty() == false (i.e. you must have started tracking by calling start_track())
- ensures
- When searching for the object in img, we search in the area around the provided guess.
- #get_position() == the new predicted location of the object in img. This location will be a copy of guess that has been translated and scaled appropriately based on the content of img so that it, hopefully, bounds the object in img.
- Returns the peak to side-lobe ratio. This is a number that measures how confident the tracker is that the object is inside #get_position(). Larger values indicate higher confidence.
- update( (correlation_tracker)arg1, (object)image, (rectangle)guess) -> float :
- requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- get_position().is_empty() == false (i.e. you must have started tracking by calling start_track())
- ensures
- When searching for the object in img, we search in the area around the provided guess.
- #get_position() == the new predicted location of the object in img. This location will be a copy of guess that has been translated and scaled appropriately based on the content of img so that it, hopefully, bounds the object in img.
- Returns the peak to side-lobe ratio. This is a number that measures how confident the tracker is that the object is inside #get_position(). Larger values indicate higher confidence.
-
-
dlib.
cross_validate_ranking_trainer
((svm_rank_trainer)trainer, (ranking_pairs)samples, (int)folds) → _ranking_test¶ cross_validate_ranking_trainer( (svm_rank_trainer_sparse)trainer, (sparse_ranking_pairs)samples, (int)folds) -> _ranking_test
-
dlib.
cross_validate_sequence_segmenter
((vectorss)samples, (rangess)segments, (int)folds[, (segmenter_params)params=<BIO, highFeats, signed, win=5, threads=4, eps=0.1, cache=40, non-verbose, C=100>]) → segmenter_test¶ cross_validate_sequence_segmenter( (sparse_vectorss)samples, (rangess)segments, (int)folds [, (segmenter_params)params=<BIO,highFeats,signed,win=5,threads=4,eps=0.1,cache=40,non-verbose,C=100>]) -> segmenter_test
-
dlib.
cross_validate_trainer
((svm_c_trainer_radial_basis)trainer, (vectors)x, (array)y, (int)folds) → _binary_test¶ cross_validate_trainer( (svm_c_trainer_sparse_radial_basis)trainer, (sparse_vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_histogram_intersection)trainer, (vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_sparse_histogram_intersection)trainer, (sparse_vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_linear)trainer, (vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_sparse_linear)trainer, (sparse_vectors)x, (array)y, (int)folds) -> _binary_test
-
dlib.
cross_validate_trainer_threaded
((svm_c_trainer_radial_basis)trainer, (vectors)x, (array)y, (int)folds, (int)num_threads) → _binary_test¶ cross_validate_trainer_threaded( (svm_c_trainer_sparse_radial_basis)trainer, (sparse_vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_histogram_intersection)trainer, (vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_sparse_histogram_intersection)trainer, (sparse_vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_linear)trainer, (vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_sparse_linear)trainer, (sparse_vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
-
dlib.
dot
((vector)arg1, (vector)arg2) → float :¶ Compute the dot product between two dense column vectors.
-
class
dlib.
drectangle
¶ This object represents a rectangular area of an image with floating point coordinates.
-
area
((drectangle)arg1) → float¶
-
bottom
((drectangle)arg1) → float¶
-
center
((drectangle)arg1) → point¶
-
contains
((drectangle)arg1, (point)point) → bool¶ contains( (drectangle)arg1, (int)x, (int)y) -> bool
contains( (drectangle)arg1, (drectangle)rectangle) -> bool
-
dcenter
((drectangle)arg1) → point¶
-
height
((drectangle)arg1) → float¶
-
intersect
((drectangle)arg1, (drectangle)rectangle) → drectangle¶
-
is_empty
((drectangle)arg1) → bool¶
-
left
((drectangle)arg1) → float¶
-
right
((drectangle)arg1) → float¶
-
top
((drectangle)arg1) → float¶
-
width
((drectangle)arg1) → float¶
-
-
class
dlib.
face_recognition_model_v1
¶ This object maps human faces into 128D vectors where pictures of the same person are mapped near to each other and pictures of different people are mapped far apart. The constructor loads the face recognition model from a file. The model file is available here: http://dlib.net/files/dlib_face_recognition_resnet_model_v1.dat.bz2
-
compute_face_descriptor
((face_recognition_model_v1)arg1, (object)img, (full_object_detection)face[, (int)num_jitters=0]) → vector :¶ Takes an image and a full_object_detection that references a face in that image and converts it into a 128D face descriptor. If num_jitters>1 then each face will be randomly jittered slightly num_jitters times, each run through the 128D projection, and the average used as the face descriptor.
- compute_face_descriptor( (face_recognition_model_v1)arg1, (object)img, (full_object_detections)faces [, (int)num_jitters=0]) -> vectors :
- Takes an image and an array of full_object_detections that reference faces in that image and converts them into 128D face descriptors. If num_jitters>1 then each face will be randomly jittered slightly num_jitters times, each run through the 128D projection, and the average used as the face descriptor.
-
-
class
dlib.
fhog_object_detector
¶ This object represents a sliding window histogram-of-oriented-gradients based object detector.
-
run
((fhog_object_detector)arg1, (object)image[, (int)upsample_num_times=0[, (float)adjust_threshold=0.0]]) → tuple :¶ - requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- upsample_num_times >= 0
- ensures
- This function runs the object detector on the input image and returns a tuple of (list of detections, list of scores, list of weight_indices).
- Upsamples the image upsample_num_times before running the basic detector.
-
static
run_multiple
((list)detectors, (object)image[, (int)upsample_num_times=0[, (float)adjust_threshold=0.0]]) → tuple :¶ - requires
- detectors is a list of detectors.
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- upsample_num_times >= 0
- ensures
- This function runs the list of object detectors at once on the input image and returns a tuple of (list of detections, list of scores, list of weight_indices).
- Upsamples the image upsample_num_times before running the basic detector.
-
save
((fhog_object_detector)arg1, (str)detector_output_filename) → None :¶ Save a simple_object_detector to the provided path.
-
-
dlib.
find_candidate_object_locations
((object)image, (list)rects[, (tuple)kvals=(50, 200, 3)[, (int)min_size=20[, (int)max_merging_iterations=50]]]) → None :¶ Returns found candidate objects requires
- image == an image object which is a numpy ndarray
- len(kvals) == 3
- kvals should be a tuple that specifies the range of k values to use. In particular, it should take the form (start, end, num) where num > 0.
- ensures
This function takes an input image and generates a set of candidate rectangles which are expected to bound any objects in the image. It does this by running a version of the segment_image() routine on the image and then reports rectangles containing each of the segments as well as rectangles containing unions of adjacent segments. The basic idea is described in the paper:
Segmentation as Selective Search for Object Recognition by Koen E. A. van de Sande, et al.
Note that this function deviates from what is described in the paper slightly. See the code for details.
The basic segmentation is performed kvals[2] times, each time with the k parameter (see segment_image() and the Felzenszwalb paper for details on k) set to a different value from the range of numbers linearly spaced between kvals[0] to kvals[1].
When doing the basic segmentations prior to any box merging, we discard all rectangles that have an area < min_size. Therefore, all outputs and subsequent merged rectangles are built out of rectangles that contain at least min_size pixels. Note that setting min_size to a smaller value than you might otherwise be interested in using can be useful since it allows a larger number of possible merged boxes to be created.
There are max_merging_iterations rounds of neighboring blob merging. Therefore, this parameter has some effect on the number of output rectangles you get, with larger values of the parameter giving more output rectangles.
This function appends the output rectangles into #rects. This means that any rectangles in rects before this function was called will still be in there after it terminates. Note further that #rects will not contain any duplicate rectangles. That is, for all valid i and j where i != j it will be true that:
- #rects[i] != rects[j]
-
class
dlib.
full_object_detection
¶ This object represents the location of an object in an image along with the positions of each of its constituent parts.
-
num_parts
¶ The number of parts of the object.
-
part
((full_object_detection)arg1, (int)idx) → point :¶ A single part of the object as a dlib point.
-
parts
((full_object_detection)arg1) → points :¶ A vector of dlib points representing all of the parts.
-
rect
¶ Bounding box from the underlying detector. Parts can be outside box if appropriate.
-
-
class
dlib.
full_object_detections
¶ An array of full_object_detection objects.
-
append
((full_object_detections)arg1, (object)arg2) → None¶
-
clear
((full_object_detections)arg1) → None¶
-
extend
((full_object_detections)arg1, (object)arg2) → None¶
-
resize
((full_object_detections)arg1, (int)arg2) → None¶
-
-
dlib.
get_frontal_face_detector
() → fhog_object_detector :¶ Returns the default face detector
-
dlib.
hit_enter_to_continue
() → None :¶ Asks the user to hit enter to continue and pauses until they do so.
-
class
dlib.
image_window
¶ This is a GUI window capable of showing images on the screen.
-
add_overlay
((image_window)arg1, (rectangles)rectangles[, (rgb_pixel)color=rgb_pixel(ÿ,
-