[top]add_image_left_right_flips
This routine takes a set of images and bounding boxes within those
images and doubles the size of the dataset by adding left/right
flipped copies of each image as well as the corresponding bounding
boxes. Therefore, this function is useful if you are training and
object detector and your objects have a left/right symmetry.
C++ Example Programs:
fhog_object_detector_ex.cpp [top]add_image_rotations
This routine takes a set of images and bounding boxes within those images and
grows the dataset by computing many different rotations of each image. It will
also adjust the positions of the bounding boxes so that they still fall on the
same objects in each rotated image.
[top]assign_all_pixels
This global function assigns all the pixels in an image a specific value.
[top]assign_border_pixels
This global function assigns all the pixels in the border of an image to
a specific value.
[top]assign_image
This global function copies one image into another and performs any
necessary color space conversions to make it work right.
[top]assign_image_scaled
This global function copies one image into another and performs any
necessary color space conversions to make it work right. Additionally,
if the dynamic range of the source image is too big to fit into the destination image
then it will attempt to perform the appropriate scaling.
[top]assign_pixel
assign_pixel() is a templated function that can assign any pixel type to another pixel type.
It will perform whatever conversion is necessary to make the assignment work. (E.g. color to
grayscale conversion)
[top]assign_pixel_intensity
assign_pixel_intensity() is a templated function that can change the
intensity of a pixel. So if the pixel in question is a grayscale pixel
then it simply assigns that pixel the given value. However, if the
pixel is not a grayscale pixel then it converts the pixel to the
HSI color space and sets the I channel to the given intensity
and then converts this HSI value back to the original pixel's
color space.
[top]auto_threshold_image
This global function performs a simple binary thresholding on an image.
Instead of taking a user supplied threshold
it computes one from the image using k-means clustering.
[top]bgr_pixel
This is a simple struct that represents a BGR colored graphical pixel.
The difference between this object and the rgb_pixel
is just that this struct lays its pixels down in memory in BGR order rather
than RGB order. You only care about this if you are doing something like
using the cv_image object to map an OpenCV image
into a more object oriented form.
[top]binary_close
This global function performs a morphological closing on an image.
[top]binary_complement
This global function computes the complement of a binary image.
[top]binary_difference
This global function computes the difference of two binary images.
[top]binary_dilation
This global function performs the morphological operation of dilation on an image.
[top]binary_erosion
This global function performs the morphological operation of erosion on an image.
[top]binary_intersection
This global function computes the intersection of two binary images.
[top]binary_open
This global function performs a morphological opening on an image.
[top]binary_union
This global function computes the union of two binary images.
[top]binned_vector_feature_image
This object is a tool for performing image feature extraction. In
particular, it wraps another image feature extractor and converts the
wrapped image feature vectors into a high dimensional sparse vector. For
example, if the lower level feature extractor outputs the vector [3,4,5]
and this vector is hashed into the second bin of four bins then the output
sparse vector is:
[0,0,0,0, 3,4,5,1, 0,0,0,0, 0,0,0,0].
That is, the output vector has a dimensionality that is equal to the number
of hash bins times the dimensionality of the lower level vector plus one.
The value in the extra dimension concatenated onto the end of the vector is
always a constant value of of 1 and serves as a bias value. This means
that, if there are N hash bins, these vectors are capable of representing N
different linear functions, each operating on the vectors that fall into
their corresponding hash bin.
The following feature extractors can be wrapped by the binned_vector_feature_image:
[top]compute_box_dimensions
This function is a tool for computing a rectangle with a particular
width/height ratio and area.
[top]compute_dominant_angle
Computes and returns the dominant angle (i.e. the angle of the dominant gradient)
at a given point and scale in an image. This function is part of the
main processing of the SURF algorithm.
[top]compute_surf_descriptor
Computes the 64 dimensional SURF descriptor vector of a box centered
at a given center point, tilted at a given angle, and sized according to
a given scale.
[top]correlation_tracker
This is a tool for tracking moving objects in a video stream. You give it
the bounding box of an object in the first frame and it attempts to track the
object in the box from frame to frame.
This tool is an implementation of the method described in the following paper:
Danelljan, Martin, et al. "Accurate scale estimation for robust visual
tracking." Proceedings of the British Machine Vision Conference BMVC. 2014.
C++ Example Programs:
video_tracking_ex.cppPython Example Programs:
correlation_tracker.py [top]create_grid_detection_template
This function is a tool for creating a detection template usable by
the
scan_image_pyramid object. This
particular function creates a detection template with a grid of feature
extraction regions.
[top]create_overlapped_2x2_detection_template
This function is a tool for creating a detection template usable by
the
scan_image_pyramid object. This
particular function creates a detection template with four overlapping feature
extraction regions.
[top]create_single_box_detection_template
This function is a tool for creating a detection template usable by
the
scan_image_pyramid object. This
particular function creates a detection template with exactly one feature
extraction region.
[top]create_tiled_pyramid
This function creates an image pyramid and packs the entire pyramid into
one big image. It does this by tiling the different pyramid layers together
and outputting the result. Here is an example:
Also, you can use the
image_to_tiled_pyramid()
and
tiled_pyramid_to_image() routines
to convert between the input image coordinate space and the tiled pyramid
coordinate space.
[top]cv_image
This object is meant to be used as a simple wrapper around the OpenCV
IplImage struct or Mat object. Using this class template you can turn
an OpenCV image into something that looks like a normal dlib style
image object.
So you should be able to use cv_image objects with many of the image
processing functions in dlib as well as the GUI tools for displaying
images on the screen.
Note that you can do the reverse conversion, from dlib to OpenCV,
using the toMat routine.
C++ Example Programs:
webcam_face_pose_ex.cpp [top]determine_object_boxes
The
scan_image_pyramid object represents a sliding
window classifier system. For it to work correctly it needs to be given a set of
object boxes which define the size and shape of each sliding window and these windows
need to be able to match the sizes and shapes of targets the user wishes to detect.
Therefore, the determine_object_boxes() routine is a tool for computing a set of object boxes
which can meet this requirement.
[top]disturb_colors
Applies a random color transform an image. This is done by
creating a
random_color_transform with the given parameters and then
transforming each pixel in the image with the resulting transform.
[top]draw_fhog
This function takes a FHOG feature map which was created by
extract_fhog_features and
converts it into an image suitable for display on the screen. In
particular, we draw all the hog cells into a grayscale image in a
way that shows the magnitude and orientation of the gradient
energy in each cell.
C++ Example Programs:
fhog_ex.cpp,
fhog_object_detector_ex.cpp [top]draw_line
This global function draws a line on an image.
[top]draw_rectangle
This global function draws a rectangle on an image.
[top]draw_solid_circle
This global function draws a solid circle on an image.
[top]edge_orientation
This global function takes horizontal and vertical gradient magnitude
values and returns the orientation of the gradient.
[top]equalize_histogram
This global function performs histogram equalization on an image.
[top]evaluate_detectors
This function allows you to efficiently run a bunch of
scan_fhog_pyramid based
object_detectors
over an image. Importantly, this function is faster than running
each detector individually because it computes the HOG features
only once and then reuses them for each detector.
C++ Example Programs:
fhog_object_detector_ex.cpp [top]extract_fhog_features
This function implements the HOG feature extraction method described in
the paper:
Object Detection with Discriminatively Trained Part Based Models by
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan
in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep. 2010
This means that it takes an input image and outputs Felzenszwalb's
31 dimensional version of HOG features.
C++ Example Programs:
fhog_ex.cpp [top]extract_highdim_face_lbp_descriptors
This function extracts the high-dimensional LBP feature described in the
paper:
Blessing of Dimensionality: High-dimensional Feature and Its Efficient
Compression for Face Verification by Dong Chen, Xudong Cao, Fang Wen, and
Jian Sun
[top]extract_histogram_descriptors
This function extracts histograms of pixel values from a set of windows in an
image and returns the histograms.
[top]extract_image_chips
This function extracts "chips" from an image. That is, it takes a list of
rectangular sub-windows (i.e. chips) within an image and extracts those
sub-windows, storing each into its own image. It also allows the user to
specify the scale and rotation for the chip.
C++ Example Programs:
face_landmark_detection_ex.cpp [top]extract_uniform_lbp_descriptors
Extracts histograms of uniform local-binary-patterns from an image. The
histograms are from densely tiled windows that do not overlap and cover all
of the image.
We use the idea of uniform LBPs from the paper:
Face Description with Local Binary Patterns: Application to Face Recognition
by Ahonen, Hadid, and Pietikainen.
[top]fill_rect
This global function draws a solid rectangle on an image.
[top]find_candidate_object_locations
This function takes an input image and generates a set of candidate
rectangles which are expected to bound any objects in the image. It does
this by running a version of the
segment_image routine on the image and
then reports rectangles containing each of the segments as well as rectangles
containing unions of adjacent segments. The basic idea is described in the
paper:
Segmentation as Selective Search for Object Recognition by Koen E. A. van de Sande, et al.
Note that this function deviates from what is described in the paper slightly.
See the code for details.
Python Example Programs:
find_candidate_object_locations.py [top]find_points_above_thresh
This routine finds all points in an image with a pixel value above a
threshold. It also has the ability to produce an efficient random
subsample of such points if the number of them is very large.
[top]fine_hog_image
This object is a version of the
hog_image that
allows you to extract HOG features at a finer resolution.
[top]flip_image_dataset_left_right
This routine takes a set of images and bounding boxes within those images and
mirrors the entire dataset left to right. This means that all images are
flipped left to right and the bounding boxes are adjusted so that they still
sit on top of the same visual objects in the new flipped images.
[top]flip_image_left_right
This is a routine which can flip an image from left to right. (e.g. as
if viewed through a mirror).
[top]flip_image_up_down
This routine flips an image upside down.
[top]float_spatially_filter_image_separable
This global function performs spatial filtering on an image with a user
supplied separable filter. It is optimized to work only on float valued
images with float valued filters.
[top]full_object_detection
This object represents the location of an object in an image along with the
positions of each of its constituent parts.
[top]gaussian_blur
This global function blurs an image by convolving it with a Gaussian filter.
[top]get_histogram
This global function computes an image's histogram and returns it in the
form of a column or row
matrix object.
[top]get_pixel_intensity
get_pixel_intensity() is a templated function that
returns the grayscale intensity of a pixel. If the pixel isn't a grayscale
pixel then it converts the pixel to grayscale and returns that value.
[top]get_surf_points
This function runs the complete SURF algorithm on an input image and
returns the points it found. For a description of what exactly
the SURF algorithm does you should read the following paper:
SURF: Speeded Up Robust Features
By Herbert Bay, Tinne Tuytelaars, and Luc Van Gool
Also note that there are numerous flavors of the SURF algorithm
you can put together using the functions in dlib. The get_surf_points()
function is just an example of one way you might do so.
C++ Example Programs:
surf_ex.cpp [top]haar_x
This is a function that operates on an
integral_image
and allows you to compute the response of a Haar wavelet oriented along
the X axis.
[top]haar_y
This is a function that operates on an
integral_image
and allows you to compute the response of a Haar wavelet oriented along
the Y axis.
[top]hashed_feature_image
This object is a tool for performing image feature extraction. In
particular, it wraps another image feature extractor and converts
the wrapped image feature vectors into sparse indicator vectors. It does
this by hashing each feature vector and then returns a new vector
which is zero everywhere except for the position determined by the
hash.
The following feature extractors can be wrapped by the hashed_feature_image:
C++ Example Programs:
object_detector_ex.cpp,
train_object_detector.cpp [top]heatmap
Converts a grayscale image into a heatmap. This is useful if you want
to display a grayscale image with more than 256 values. In particular,
this function uses the following color mapping:
C++ Example Programs:
image_ex.cpp [top]hessian_pyramid
This object represents an image pyramid where each level in the
pyramid holds determinants of Hessian matrices for the original
input image. This object can be used to find stable interest
points in an image.
This object is an implementation of the fast Hessian pyramid
as described in the paper:
SURF: Speeded Up Robust Features
By Herbert Bay, Tinne Tuytelaars, and Luc Van Gool
This implementation was also influenced by the very well documented
OpenSURF library and its corresponding description of how the fast
Hessian algorithm functions:
Notes on the OpenSURF Library by Christopher Evans
[top]hog_image
This object is a tool for performing the image feature extraction algorithm
described in the following paper:
Histograms of Oriented Gradients for Human Detection
by Navneet Dalal and Bill Triggs
C++ Example Programs:
object_detector_ex.cpp,
train_object_detector.cpp [top]hough_transform
This object is a tool for computing the line finding version of
the Hough transform given some kind of edge detection image as
input. It also allows the edge pixels to be weighted such that
higher weighted edge pixels contribute correspondingly more to
the output of the Hough transform, allowing stronger edges to
create correspondingly stronger line detections in the final
Hough transform.
C++ Example Programs:
hough_transform_ex.cpp [top]hsi_pixel
This is a simple struct that represents a HSI colored graphical pixel.
[top]hysteresis_threshold
This global function performs hysteresis thresholding on an image.
[top]integral_image
This is a specialization of the
integral_image_generic
template for the case where sums of pixel values should be represented with
longs. E.g. if you use 8bit pixels in your original images then this is
the appropriate kind of integral image to use with them.
[top]integral_image_generic
This object is an alternate way of representing image data
that allows for very fast computations of sums of pixels in
rectangular regions. To use this object you load it with a
normal image and then you can use the get_sum_of_area()
member function to compute sums of pixels in a given area in
constant time.
[top]interest_point
This is a simple struct used to represent the interest points returned
by the
get_interest_points function.
[top]interpolate_bilinear
This object is a tool for performing bilinear interpolation
on an image.
[top]interpolate_nearest_neighbor
This object is a tool for performing nearest neighbor interpolation
on an image.
[top]interpolate_quadratic
This object is a tool for performing quadratic interpolation
on an image.
[top]jet
Converts a grayscale image into an image using the jet color
scheme. This is useful if you want to display a grayscale image
with more than 256 values. In particular, this function uses the
following color mapping:
[top]jpeg_loader
This object loads a JPEG image file into
an
array2d of
pixels.
Note that you must define DLIB_JPEG_SUPPORT if you want to use this object. You
must also set your build environment to link to the libjpeg library. However,
if you use CMake and dlib's default CMakeLists.txt file then it will get setup
automatically.
[top]label_connected_blobs
This function labels each of the connected blobs in an image with a unique integer label.
[top]lab_pixel
This is a simple struct that represents a CIELab colored graphical pixel.
[top]load_dng
This global function loads a dlib DNG file (a lossless compressed image format) into
an
array2d of
pixels.
[top]load_image
This global function takes a file name, looks at its extension, and
then loads it into an
array2d of
pixels using the appropriate image
loading routine. The supported types are BMP, PNG, JPEG, GIF, and the dlib DNG file format.
Note that you can only load PNG, JPEG, and GIF files if you link against
libpng, libjpeg, and libgif respectively. You will also need to #define
DLIB_PNG_SUPPORT, DLIB_JPEG_SUPPORT, and DLIB_GIF_SUPPORT. Or use CMake and
it will do all this for you.
C++ Example Programs:
image_ex.cpp [top]load_jpeg
This function loads a JPEG image file into
an
array2d of
pixels.
Note that you must define DLIB_JPEG_SUPPORT if you want to use this object. You
must also set your build environment to link to the libjpeg library. However,
if you use CMake and dlib's default CMakeLists.txt file then it will get setup
automatically.
[top]load_png
This function loads a Portable Network Graphics (PNG) image file into
an
array2d of
pixels.
Note that you must define DLIB_PNG_SUPPORT if you want to use this object. You
must also set your build environment to link to the libpng library. However,
if you use CMake and dlib's default CMakeLists.txt file then it will get setup
automatically.
[top]make_uniform_lbp_image
This function extracts the uniform local-binary-pattern feature at every pixel
of an image and stores the output in a new image object.
We use the idea of uniform LBPs from the paper:
Face Description with Local Binary Patterns: Application to Face Recognition
by Ahonen, Hadid, and Pietikainen.
[top]max_filter
This function slides a rectangle over an input image and outputs a new
image which contains the maximum valued pixel found inside the rectangle at each
position in the input image.
[top]nearest_neighbor_feature_image
This object is a tool for performing image feature extraction. In
particular, it wraps another image feature extractor and converts
the wrapped image feature vectors into sparse indicator vectors. It does
this by finding the nearest neighbor for each feature vector and returning an
indicator vector that is zero everywhere except for the position indicated by
the nearest neighbor.
The following feature extractors can be wrapped by the nearest_neighbor_feature_image:
[top]object_detector
This object is a tool for detecting the positions of objects in an image. In
particular, it is a simple container to aggregate an instance of an image
scanner object (either
scan_fhog_pyramid,
scan_image_pyramid,
scan_image_boxes, or
scan_image_custom), the weight vector
needed by one of these image scanners, and finally an instance of
test_box_overlap. The test_box_overlap object
is used to perform non-max suppression on the output of the image scanner
object.
Note that you can use the
structural_object_detection_trainer
to learn the parameters of an object_detector. See the example programs for an introduction.
Also note that dlib contains more powerful CNN based object detection
tooling, which will usually run slower but produce much
more general and accurate detectors.
C++ Example Programs:
fhog_object_detector_ex.cpp,
face_detection_ex.cpp,
object_detector_ex.cpp,
object_detector_advanced_ex.cpp,
train_object_detector.cppPython Example Programs:
face_detector.py,
train_object_detector.py [top]pixel_traits
As the name implies, this is a traits class for pixel types. It allows you
to determine what sort of pixel type you are dealing with.
[top]png_loader
This object loads a Portable Network Graphics (PNG) image file into
an
array2d of
pixels.
Note that you must define DLIB_PNG_SUPPORT if you want to use this object. You
must also set your build environment to link to the libpng library. However,
if you use CMake and dlib's default CMakeLists.txt file then it will get setup
automatically.
[top]poly_image
This object is a tool for extracting local feature descriptors from an image.
In particular, it fits polynomials to local pixel patches and
allows you to query the coefficients of these polynomials.
[top]pyramid_disable
This object downsamples an image at a ratio of infinity to 1. That
means it always outputs an image of size zero. This is useful because
it can be supplied to routines which take a pyramid_down function object
and it will essentially disable pyramid processing. This way, a pyramid
oriented function can be turned into a regular routine which processes
just the original undownsampled image.
[top]pyramid_down
This is a simple function object to help create image pyramids. It
downsamples an image by a ratio of N to N-1 where N is supplied by the
user as a template argument.
[top]pyramid_up
This routine upsamples an image. In particular, it takes a
pyramid_down object (or an object with a
compatible interface) as an argument and performs an upsampling
which is the inverse of the supplied pyramid_down object.
[top]randomly_color_image
Randomly generates a mapping from gray level pixel values
to the RGB pixel space and then uses this mapping to create
a colored version an image.
This function is useful for displaying the results of some image
segmentation. For example, the output of label_connected_blobs
or segment_image.
[top]randomly_sample_image_features
Given a feature extractor such as the
hog_image,
this routine selects a random subsample of local image feature vectors
from a set of images.
[top]random_color_transform
This object generates a random color balancing and gamma correction
transform. It then allows you to apply that specific transform to as many
rgb_pixel objects as you like.
[top]remove_unobtainable_rectangles
Recall that the
scan_image_pyramid and
scan_image_boxes objects can't produce
all possible rectangles as object detections since they only
consider a limited subset of all possible object positions.
Therefore, when training an object detector that uses these tools
you must make sure the training data does not contain any object
locations that are unobtainable by the image scanning model.
The remove_unobtainable_rectangles() routine is a tool to filter out
these unobtainable rectangles from the training.
[top]resize_image
This is a routine capable of resizing or stretching an image.
[top]rgb_alpha_pixel
This is a simple struct that represents an RGB colored graphical pixel with an
alpha channel.
[top]rgb_pixel
This is a simple struct that represents an RGB colored graphical pixel.
[top]rotate_image
This is a routine for rotating an image.
[top]rotate_image_dataset
This routine takes a set of images and bounding boxes within those
images and rotates the entire dataset by a user specified angle.
This means that all images are rotated and the bounding boxes are adjusted
so that they still sit on top of the same visual objects in the new rotated images.
[top]save_bmp
This global function saves an image as a MS Windows BMP file.
This routine can save images containing any type of pixel. However, it will
convert all color pixels into rgb_pixel and grayscale pixels into
uint8 type before saving to disk.
[top]save_dng
This global function saves an image as a dlib DNG file (a lossless
compressed image format).
This routine can save images containing any type of pixel. However, the DNG format
can natively store only the following pixel types: rgb_pixel, hsi_pixel,
rgb_alpha_pixel, uint8, uint16, float, and double.
All other pixel types will be converted
into one of these types as appropriate before being saved to disk.
[top]save_jpeg
This global function writes an image to disk as a JPEG file.
Note that you must define DLIB_JPEG_SUPPORT if you want to use this function. You
must also set your build environment to link to the libjpeg library. However,
if you use CMake and dlib's default CMakeLists.txt file then it will get setup
automatically.
This routine can save images containing any type of pixel. However, save_jpeg() can
only natively store the following pixel types: rgb_pixel
and uint8. All other pixel types will be converted into
one of these types as appropriate before being saved to disk.
[top]save_png
This global function writes an image to disk as a PNG (Portable Network Graphics) file.
Note that you must define DLIB_PNG_SUPPORT if you want to use this function. You
must also set your build environment to link to the libpng library. However,
if you use CMake and dlib's default CMakeLists.txt file then it will get setup
automatically.
This routine can save images containing any type of pixel. However, save_png() can
only natively store the following pixel types: rgb_pixel,
rgb_alpha_pixel, uint8, and uint16. All other pixel
types will be converted into one of these types as appropriate before being
saved to disk.
[top]scan_fhog_pyramid
This object is a tool for running a fixed sized sliding window classifier
over an image pyramid. In particular, it slides a linear classifier over
a HOG pyramid as discussed in the paper:
Histograms of Oriented Gradients for Human Detection by Navneet Dalal
and Bill Triggs, CVPR 2005
However, we augment the method slightly to use the version of HOG features
from:
Object Detection with Discriminatively Trained Part Based Models by
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep. 2010
Since these HOG features have been shown to give superior performance.
C++ Example Programs:
fhog_object_detector_ex.cppPython Example Programs:
train_object_detector.py [top]scan_image
This global function is a tool for sliding a set of rectangles
over an image space and finding the locations where the sum of pixels in
the rectangles exceeds a threshold. It is useful for implementing
certain kinds of sliding window classifiers.
[top]scan_image_boxes
This object is a tool for running a classifier over an image with the goal
of localizing each object present. The localization is in the form of the
bounding box around each object of interest.
Unlike the scan_image_pyramid object which scans a
fixed sized window over an image pyramid, the scan_image_boxes tool allows you to
define your own list of "candidate object locations" which should be evaluated.
This is simply a list of rectangle objects which might contain objects of
interest. The scan_image_boxes object will then evaluate the classifier at each
of these locations and return the subset of rectangles which appear to have
objects in them.
This object can also be understood as a general tool for implementing the spatial
pyramid models described in the paper:
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing
Natural Scene Categories by Svetlana Lazebnik, Cordelia Schmid,
and Jean Ponce
The following feature extractors can be used with the scan_image_boxes object:
[top]scan_image_custom
This object is a tool for running a classifier over an image with the goal
of localizing each object present. The localization is in the form of the
bounding box around each object of interest.
Unlike the scan_image_pyramid
and scan_image_boxes objects, this image
scanner delegates all the work of constructing the object feature vector to
a user supplied feature extraction object. That is, scan_image_custom
simply asks the supplied feature extractor what boxes in the image we
should investigate and then asks the feature extractor for the complete
feature vector for each box. That is, scan_image_custom does not apply any
kind of pyramiding or other higher level processing to the features coming
out of the feature extractor. That means that when you use
scan_image_custom it is completely up to you to define the feature vector
used with each image box.
[top]scan_image_movable_parts
This global function is a tool for sliding a set of rectangles
over an image space and finding the locations where the sum of pixels in
the rectangles exceeds a threshold. It is useful for implementing
certain kinds of sliding window classifiers. The behavior of this
routine is similar to
scan_image except that
it can also handle movable parts in addition to rigidly placed parts
within the sliding window.
[top]scan_image_pyramid
This object is a tool for running a sliding window classifier over
an image pyramid. This object can also be understood as a general
tool for implementing the spatial pyramid models described in the paper:
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing
Natural Scene Categories by Svetlana Lazebnik, Cordelia Schmid,
and Jean Ponce
It also includes the ability to represent movable part models.
The following feature extractors can be used with the scan_image_pyramid object:
C++ Example Programs:
object_detector_ex.cpp,
object_detector_advanced_ex.cpp [top]segment_image
Attempts to segment an image into regions which have some visual consistency to them.
In particular, this function implements the algorithm described in the paper:
Efficient Graph-Based Image Segmentation by Felzenszwalb and Huttenlocher.
[top]separable_3x3_filter_block_grayscale
This routine filters part of an image with a user supplied 3x3 separable filter.
The output is a grayscale sub-image.
[top]separable_3x3_filter_block_rgb
This routine filters part of an image with a user supplied 3x3 separable filter.
The output is a RGB sub-image.
[top]shape_predictor
This object is a tool that takes in an image region containing some object
and outputs a set of point locations that define the pose of the
object. The classic example of this is human face pose prediction, where
you take an image of a human face as input and are expected to identify the
locations of important facial landmarks such as the corners of the mouth
and eyes, tip of the nose, and so forth. For example, here is the output
of dlib's
68-face-landmark shape_predictor on an image from the HELEN dataset:
To create useful instantiations of this object you need to use the
shape_predictor_trainer object to train a
shape_predictor using a set of training images, each annotated with shapes you want to predict.
To do this, the shape_predictor_trainer uses the state-of-the-art method from the
paper:
One Millisecond Face Alignment with an Ensemble of Regression Trees
by Vahid Kazemi and Josephine Sullivan, CVPR 2014
C++ Example Programs:
face_landmark_detection_ex.cpp,
train_shape_predictor_ex.cpp,
webcam_face_pose_ex.cppPython Example Programs:
train_shape_predictor.py,
face_landmark_detection.py [top]skeleton
This function computes the skeletonization of an image. That is,
given a binary image, we progressively thin the binary blobs
until only a single pixel wide skeleton of the original blobs
remains.
[top]sobel_edge_detector
This global function performs spatial filtering on an image using the
sobel edge detection filters.
C++ Example Programs:
image_ex.cpp [top]spatially_filter_image
This global function performs spatial filtering on an image with a user
supplied filter.
[top]spatially_filter_image_separable
This global function performs spatial filtering on an image with a user
supplied separable filter.
[top]spatially_filter_image_separable_down
This global function performs spatial filtering on an image with a user
supplied separable filter. Additionally, it produces a downsampled
output.
[top]sub_image
This function returns a lightweight sub-image of another image. In particular,
the returned sub-image simply holds a pointer to the original image, meaning there
is no overhead for using or creating the sub-image.
[top]sum_filter
This function slides a rectangle over an input image and adds the sum
of pixel values in each rectangle location to another image.
[top]sum_filter_assign
This function slides a rectangle over an input image and outputs a new
image which contains the sum of pixels inside the rectangle at each
position in the input image.
[top]suppress_non_maximum_edges
This global function performs non-maximum suppression on a gradient
image.
C++ Example Programs:
image_ex.cpp [top]test_box_overlap
This object is a simple function object for determining if two
rectangles overlap.
[top]threshold_image
This global function performs a simple binary thresholding on an image with a user
supplied threshold.
[top]tile_images
This function takes an array of images and tiles them into a single large
square image and returns this new big tiled image. Therefore, it is a useful
method to visualize many small images at once.
[top]toMat
This routine converts a dlib style image into an instance of OpenCV's cv::Mat object.
This is done by setting up the Mat object to point to the same memory as the dlib image.
Note that you can do the reverse conversion, from OpenCV to dlib,
using the cv_image object.
[top]transform_image
This routine is a tool for transforming images using some kind of point mapping
function (e.g.
point_transform_affine)
and pixel interpolation tool (e.g.
interpolate_quadratic).
An example application of this routine is for image rotation. Indeed, it is how
rotate_image is implemented.
[top]upsample_image_dataset
This routine takes a set of images and bounding boxes within those images and
upsamples the entire dataset. This means that all images are upsampled and the
bounding boxes are adjusted so that they still sit on top of the same visual
objects in the new images.
[top]zero_border_pixels
This global function zeros the pixels on the border of an image.