In this tutorial, we will understand an important concept called “Selective Search” in Object Detection. We will also share OpenCV code in C++ and Python.
Object Detection vs. Object Recognition
An object recognition algorithm identifies which objects are present in an image. It takes the entire image as an input and outputs class labels and class probabilities of objects present in that image. For example, a class label could be “dog” and the associated class probability could be 97%.
On the other hand, an object detection algorithm not only tells you which objects are present in the image, it also outputs bounding boxes (x, y, width, height) to indicate the location of the objects inside the image.
At the heart of all object detection algorithms is an object recognition algorithm. Suppose we trained an object recognition model which identifies dogs in image patches. This model will tell whether an image has a dog in it or not. It does not tell where the object is located.
To localize the object, we have to select sub-regions (patches) of the image and then apply the object recognition algorithm to these image patches. The location of the objects is given by the location of the image patches where the class probability returned by the object recognition algorithm is high.
The most straightforward way to generate smaller sub-regions (patches) is called the Sliding Window approach. However, the sliding window approach has several limitations. These limitations are overcome by a class of algorithms called the “Region Proposal” algorithms. Selective Search is one of the most popular Region Proposal algorithms.
Sliding Window Algorithm
In the sliding window approach, we slide a box or window over an image to select a patch and classify each image patch covered by the window using the object recognition model. It is an exhaustive search for objects over the entire image. Not only do we need to search all possible locations in the image, we have to search at different scales. This is because object recognition models are generally trained at a specific scale (or range of scales). This results into classifying tens of thousands of image patches.
The problem doesn’t end here. Sliding window approach is good for fixed aspect ratio objects such as faces or pedestrians. Images are 2D projections of 3D objects. Object features such as aspect ratio and shape vary significantly based on the angle at which image is taken. The sliding window approach because computationally very expensive when we search for multiple aspect ratios.
Region Proposal Algorithms
The problems we have discussed so far can be solved using region proposal algorithms. These methods take an image as the input and output bounding boxes corresponding to all patches in an image that are most likely to be objects. These region proposals can be noisy, overlapping and may not contain the object perfectly but amongst these region proposals, there will be a proposal which will be very close to the actual object in the image. We can then classify these proposals using the object recognition model. The region proposals with the high probability scores are locations of the object.
Region proposal algorithms identify prospective objects in an image using segmentation. In segmentation, we group adjacent regions which are similar to each other based on some criteria such as color, texture etc. Unlike the sliding window approach where we are looking for the object at all pixel locations and at all scales, region proposal algorithm work by grouping pixels into a smaller number of segments. So the final number of proposals generated are many times less than sliding window approach. This reduces the number of image patches we have to classify. These generated region proposals are of different scales and aspect ratios.
An important property of a region proposal method is to have a very high recall. This is just a fancy way of saying that the regions that contain the objects we are looking have to be in our list of region proposals. To accomplish this our list of region proposals may end up having a lot of regions that do not contain any object. In other words, it is ok for the region proposal algorithm to produce a lot of false positives so long as it catches all the true positives. Most of these false positives will be rejected by object recognition algorithm. The time it takes to do the detection goes up when we have more false positives and the accuracy is affected slightly. But having a high recall is still a good idea because the alternative of missing the regions containing the actual objects severely impacts the detection rate.
Several region proposal methods have been proposed such as
- Objectness
- Constrained Parametric Min-Cuts for Automatic Object Segmentation
- Category Independent Object Proposals
- Randomized Prim
- Selective Search
Amongst all these region proposal methods Selective Search is most commonly used because it is fast and has a very high recall.
Selective Search for Object Recognition
What is Selective Search?
Selective Search is a region proposal algorithm used in object detection. It is designed to be fast with a very high recall. It is based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility.
Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher. The output of the algorithm is shown below. The image on the right contains segmented regions represented using solid colors.
Can we use segmented parts in this image as region proposals? The answer is no and there are two reasons why we cannot do that:
- Most of the actual objects in the original image contain 2 or more segmented parts
- Region proposals for occluded objects such as the plate covered by the cup or the cup filled with coffee cannot be generated using this method
If we try to address the first problem by further merging the adjacent regions similar to each other we will end up with one segmented region covering two objects.
Perfect segmentation is not our goal here. We just want to predict many region proposals such that some of them should have very high overlap with actual objects.
Selective search uses oversegments from Felzenszwalb and Huttenlocher’s method as an initial seed. An oversegmented image looks like this.
Selective Search algorithm takes these oversegments as initial input and performs the following steps
- Add all bounding boxes corresponding to segmented parts to the list of regional proposals
- Group adjacent segments based on similarity
- Go to step 1
At each iteration, larger segments are formed and added to the list of region proposals. Hence we create region proposals from smaller segments to larger segments in a bottom-up approach. This is what we mean by computing “hierarchical” segmentations using Felzenszwalb and Huttenlocher’s oversegments.
This image shows the initial, middle and last step of the hierarchical segmentation process.
Similarity
Let’s dive deeper into how do we calculate the similarity between two regions.
Selective Search uses 4 similarity measures based on color, texture, size and shape compatibility.
Color Similarity
A color histogram of 25 bins is calculated for each channel of the image and histograms for all channels are concatenated to obtain a color descriptor resulting into a 25×3 = 75-dimensional color descriptor.
Color similarity of two regions is based on histogram intersection and can be calculated as:
is the histogram value for
bin in color descriptor
Texture Similarity
Texture features are calculated by extracting Gaussian derivatives at 8 orientations for each channel. For each orientation and for each color channel, a 10-bin histogram is computed resulting into a 10x8x3 = 240-dimensional feature descriptor.
Texture similarity of two regions is also calculated using histogram intersections.
is the histogram value for
bin in texture descriptor
Size Similarity
Size similarity encourages smaller regions to merge early. It ensures that region proposals at all scales are formed at all parts of the image. If this similarity measure is not taken into consideration a single region will keep gobbling up all the smaller adjacent regions one by one and hence region proposals at multiple scales will be generated at this location only. Size similarity is defined as:
where is size of image in pixels.
Shape Compatibility
Shape compatibility measures how well two regions ( and
) fit into each other. If
fits into
we would like to merge them in order to fill gaps and if they are not even touching each other they should not be merged.
Shape compatibility is defined as:
where is a bounding box around
and
.
Final Similarity
The final similarity between two regions is defined as a linear combination of aforementioned 4 similarities.
where and
are two regions or segments in the image and
denotes if the similarity measure is used or not.
Results
Selective Search implementation in OpenCV gives thousands of region proposals arranged in decreasing order of objectness. For clarity, we are sharing results with top 200-250 boxes drawn over the image. In general 1000-1200 proposals are good enough to get all the correct region proposals.
Selective Search Code
Let’s take a look on how can we use Selective Search based segmentation implemented in OpenCV.
Selective Search: C++
The code below is a C++ tutorial for Selective Search using OpenCV. Please read through the comments to understand the code.
#include "opencv2/ximgproc/segmentation.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/core.hpp"
#include "opencv2/imgproc.hpp"
#include <iostream>
#include <ctime>
using namespace cv;
using namespace cv::ximgproc::segmentation;
static void help() {
std::cout << std::endl <<
"Usage:" << std::endl <<
"./ssearch input_image (f|q)" << std::endl <<
"f=fast, q=quality" << std::endl <<
"Use l to display less rects, m to display more rects, q to quit" << std::endl;
}
int main(int argc, char** argv) {
// If image path and f/q is not passed as command
// line arguments, quit and display help message
if (argc < 3) {
help();
return -1;
}
// speed-up using multithreads
setUseOptimized(true);
setNumThreads(4);
// read image
Mat im = imread(argv[1]);
// resize image
int newHeight = 200;
int newWidth = im.cols*newHeight/im.rows;
resize(im, im, Size(newWidth, newHeight));
// create Selective Search Segmentation Object using default parameters
Ptr<SelectiveSearchSegmentation> ss = createSelectiveSearchSegmentation();
// set input image on which we will run segmentation
ss->setBaseImage(im);
// Switch to fast but low recall Selective Search method
if (argv[2][0] == 'f') {
ss->switchToSelectiveSearchFast();
}
// Switch to high recall but slow Selective Search method
else if (argv[2][0] == 'q') {
ss->switchToSelectiveSearchQuality();
}
// if argument is neither f nor q print help message
else {
help();
return -2;
}
// run selective search segmentation on input image
std::vector<Rect> rects;
ss->process(rects);
std::cout << "Total Number of Region Proposals: " << rects.size() << std::endl;
// number of region proposals to show
int numShowRects = 100;
// increment to increase/decrease total number
// of reason proposals to be shown
int increment = 50;
while(1) {
// create a copy of original image
Mat imOut = im.clone();
// itereate over all the region proposals
for(int i = 0; i < rects.size(); i++) {
if (i < numShowRects) {
rectangle(imOut, rects[i], Scalar(0, 255, 0));
}
else {
break;
}
}
// show output
imshow("Output", imOut);
// record key press
int k = waitKey();
// m is pressed
if (k == 109) {
// increase total number of rectangles to show by increment
numShowRects += increment;
}
// l is pressed
else if (k == 108 && numShowRects > increment) {
// decrease total number of rectangles to show by increment
numShowRects -= increment;
}
// q is pressed
else if (k == 113) {
break;
}
}
return 0;
}
Selective Search: Python
The code below is a Python tutorial for Selective Search using OpenCV 3.3. Note the bug alert for OpenCV 3.2 mentioned after the code block. Please read through the comments to understand the code.
#!/usr/bin/env python
'''
Usage:
./ssearch.py input_image (f|q)
f=fast, q=quality
Use "l" to display less rects, 'm' to display more rects, "q" to quit.
'''
import sys
import cv2
if __name__ == '__main__':
# If image path and f/q is not passed as command
# line arguments, quit and display help message
if len(sys.argv) < 3:
print(__doc__)
sys.exit(1)
# speed-up using multithreads
cv2.setUseOptimized(True);
cv2.setNumThreads(4);
# read image
im = cv2.imread(sys.argv[1])
# resize image
newHeight = 200
newWidth = int(im.shape[1]*200/im.shape[0])
im = cv2.resize(im, (newWidth, newHeight))
# create Selective Search Segmentation Object using default parameters
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
# set input image on which we will run segmentation
ss.setBaseImage(im)
# Switch to fast but low recall Selective Search method
if (sys.argv[2] == 'f'):
ss.switchToSelectiveSearchFast()
# Switch to high recall but slow Selective Search method
elif (sys.argv[2] == 'q'):
ss.switchToSelectiveSearchQuality()
# if argument is neither f nor q print help message
else:
print(__doc__)
sys.exit(1)
# run selective search segmentation on input image
rects = ss.process()
print('Total Number of Region Proposals: {}'.format(len(rects)))
# number of region proposals to show
numShowRects = 100
# increment to increase/decrease total number
# of reason proposals to be shown
increment = 50
while True:
# create a copy of original image
imOut = im.copy()
# itereate over all the region proposals
for i, rect in enumerate(rects):
# draw rectangle for region proposal till numShowRects
if (i < numShowRects):
x, y, w, h = rect
cv2.rectangle(imOut, (x, y), (x+w, y+h), (0, 255, 0), 1, cv2.LINE_AA)
else:
break
# show output
cv2.imshow("Output", imOut)
# record key press
k = cv2.waitKey(0) & 0xFF
# m is pressed
if k == 109:
# increase total number of rectangles to show by increment
numShowRects += increment
# l is pressed
elif k == 108 and numShowRects > increment:
# decrease total number of rectangles to show by increment
numShowRects -= increment
# q is pressed
elif k == 113:
break
# close image show window
cv2.destroyAllWindows()
Bug Alert: There was a bug in Python bindings of Selective Search which was fixed in this commit. So the Python code will work for OpenCV 3.3.0 but not for OpenCV 3.2.0.
If you don’t want to compile OpenCV 3.3.0 and have the build folder for OpenCV 3.2.0 which you compiled earlier, you can fix this bug too.
If you look at the Github commit it is just a small change. You have to change line#239 in file
opencv_contrib-3.2.0/modules/ximgproc/include/opencv2/ximgproc/segmentation.hpp
// from
CV_WRAP virtual void process(std::vector<Rect>& rects) = 0;
// to
CV_WRAP virtual void process(CV_OUT std::vector<Rect>& rects) = 0;
Now recompile your OpenCV 3.2.0 again. If you have a build folder in which OpenCV was compiled earlier, running the make command will just compile this module.
Thanks
I already have opencv3.3+Python 2.7.13 installed but when I tried to run your code I got the following error
‘module’ object has no attribute ‘ximgproc’
Note: I am working under Anaconda
Hi Walid, You need to compile the OpenCV with opencv_contrib.
Thanks. I got it running now
However , I found it really slow that is unpractical for realtime
Try the updated code. It runs much faster than earlier one.
hi
How do u get image of false positive and true positive of the dog image?
In dog’s image true positive and false positive is not inferred using code. Consider this to be ground truth data. You will have to run a classifier on region proposals generated by selective search to know which one is a true positive.
what type of classifier did you use to create that image(false&true positive?
Boxes on that image were drawn manually just to show that out of many region proposals few are correct boxes and rest are not.
But I think your question is what kind of classifier we can use with region proposal methods. The original paper used an SVM with hard negative mining. Read the paper to know about the pipeline (selective search + SVM).
https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013
Thanks for the great tutorial! I was excited to try this out. Unfortunately, this is not as fast as using haar cascade (in the email sent out to subscribers, it said that Haar cascades are slow, so selective search is used). I modified your code slightly to capture images from my webcam and removed the bits about drawing rectangles. In my tests, it is taking upwards of 10 seconds to process one image! Surely this can’t be used for real time applications. What do you think? https://uploads.disquscdn.com/images/e84961043a33e734de1a12fc4a2396dc67c3dca4c5de3f41673e9318ed5f09fc.png
Implementation of Haar Cascades in OpenCV is limited to a fixed aspect ratio classifier. Moreover it is also limited to detecting objects in a small scale window. So when we compare both these techniques under the assumption that we want to achieve similar results. So comparing runtime OpenCV’s Haar Cascades to selective search would be a bit unfair. Haar Cascades based pedestrian detection in OpenCV performs poorly whereas selective search was used in top 2 winning entries in ImageNet 2013.
There are a few ways to speed-up selective search though.
1. Use threading. Add these 2 lines to your C++ code.
setUseOptimized(true);
setNumThreads(8);
2. Decrease image size
Unless you are looking for very small objects there is no need to run selective search on an image with a 720p or 1080p resolution. When I scale down an image’s height to 200, selective search’s fast version runs within a second.
Thanks for pointing out the speed issue though. I’ll also update the article with the changes I suggested in this comment.
Well, there is an issue with threading:
one can decide not to use threading because either one does not has many cores, or one wants to be able to control other processes (some old Rapsberry Pis have only one core).
On nanoPi (less fast than RPi, but cheaper), with little RAM, I can detect using OCV 2 fiaces , akaze points if images are downsized (say 320*320, without seeing degradation). A sliding window, with fixed size, would involve (I say size of the window is 20 pixels and does not vary…) 90 000 tests nd I understood it was too huge. A selective search would be, with the same detection algorithm, 90 times faster (and, if one knows the size of the bounding box, may the rage of scales to be swept is smaller) ….
This makes your article and the links very intersting for small nanocomputers … but I have to download and try to install opencvcontrib….
Another issue wr “real time”: object detection can have a random detection time (if there is a flat background, without faces, would be processed very quickly: there is no way of warranting real time performances (I suppose a mob has many -a random number- faces to detect)
Hi, there! I’m working in a project with hand’s signal recognition using Haar Cascades, unfortunately, it’s generating some issues, like false positives, when the signal is similar. I have searched about HOG and SVM, but i didn’t find any tutorial step by step for Python.
I’m begginer in this area… someone can help me with this or give me tips about other algorithm?
We have this tutorial that uses HOG + SVM
https://learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/
I had some difficulties with this tutorial when i read it… Like: i’m using thousands of images, how do i do my dataset? Do i have to create a loop? Where do i put my results? Or is there one “HOG value” for each image?
Really, i’m sorry for being noob, how i sad, i’m beginner in that area haha
I’ll try again with this, thank you!
Hey! Thanks so much for this tutorial. I struggled last semester trying to do that! Do you have a docker image that run this code in python? If not, would you mind creating one? Best regards!!!
I’m glad I was able to help. You can use any docker image which has OpenCV installed to tun this code such as this one – https://hub.docker.com/r/jjanzic/docker-python3-opencv/
What is the command to run the code for python?
how can i get the co-ordinates of the proposed boxes in this…???
i need the co-ordinates to feed the proposed areas to a CNN.
You might look at lines 49, 63-68 in the python code…
Hello, Thanks for such a great post.
My question is, how I can use the selective search for segmentation only? I want to use segmentation done by selective search for segmenting my test data images with segmented regions represented as polygons in python.
For segmentation, you can use the method Selective Search uses. You can access it here – http://cs.brown.edu/people/pfelzens/segment/.
Or if you want Python code, you can use this Python port of Graph based Image Segmentation – https://github.com/salaee/pegbis
hello, thanks for your post
my problem is there is a attribute error when I run the python code you share :
“cv2 has no attribute ‘ximgproc’ “…
do you know what makes this problem? I install python 3.6.3 and opencv 3.3 in windows enviroment.
Best regards!
I’ve met so bad
The OpenCV installed on your machine doesn’t have contrib modules.
Use
pip install opencv-contrib-python
to install OpenCV with contrib modules.if I tried them with C++ but I also get the same error. What should I do!
thanks you!!
Thank you for this deep and clean explaination.
Thanks for the kind words Mahir.
Where do I find the .h and .cpp files needed to build the example?
My copy of OpenCV for C++ does not come with the headers/code needed to use SelectiveSearchSegmentation.
You probably don’t have OpenCV’s contrib modules installed. You can install OpenCV using one of the following links:
Install OpenCV on Ubuntu
Install OpenCV on macOS
Install OpenCV on Windows
I can not register to download the document. Please help me! email: [email protected]
I am using CNTK library. Currently the time taken for ROI computation on 960×720 image is 4mins on CPU. I would like to bring down the ROI computation time to <1min. Can you please suggest how can I do it?
https://github.com/Azure/ObjectDetectionUsingCntk/blob/master/helpers.py
https://github.com/Azure/cortana-intelligence-product-detection-from-images/blob/master/technical_deployment/train_model/PARAMETERS.py
Tried this python code and it works.
However, how to tune parameters by myself on my image dataset? Rather than using the default parameters provided by OpenCV. Already searched but nothing useful found.
Hope somebody see this comment could help me.
Thank you!
You can choose your parameters by passing them to functions
switchToSelectiveSearchFast
andswitchToSelectiveSearchQuality
on line 46 and 50 respectively in C++ code.Check following links in OpenCV documentation for syntax:
switchToSelectiveSearchFast()
switchToSelectiveSearchQuality()
Then how about OpenCV Python library?
It’s similar. The documentation pages I linked to, have syntax for Python too.
You can call Python function this way
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation(base_k=150, inc_k=150, sigma=0.8)
Just in case someone tried to use custom parameters, you should pass them to switchToSelectiveSearchFast() or switchToSelectiveSearchQuality()
Thanks. Very useful
Thanks, Kim.
I guess the output of Opencv implementation doesn’t give an ordered region proposals in terms of objectiveness. It’s totally random. And also the segmentation part has some randomness. Only if you trained the selective search for specific purpose you can get an ordered region proposals.