This post is part of a series I am writing on Image Recognition and Object Detection.
The complete list of tutorials in this series is given below:
- Image recognition using traditional Computer Vision techniques : Part 1
- Histogram of Oriented Gradients : Part 2
- Example code for image recognition : Part 3
- Training a better eye detector: Part 4a
- Object detection using traditional Computer Vision techniques : Part 4b
- How to train and test your own OpenCV object detector : Part 5
- Image recognition using Deep Learning : Part 6
- Object detection using Deep Learning : Part 7
In this tutorial, we will build a simple handwritten digit classifier using OpenCV. As always we will share code written in C++ and Python.
This post is the third in a series I am writing on image recognition and object detection. The first post introduced the traditional computer vision image classification pipeline and in the second post, we discussed the Histogram of Oriented Gradients (HOG) image descriptor in detail. We also had a guest post on training an eye detector that is related to this topic.
The last two posts were geared toward providing education needed to understand the basics. This post is geared toward providing the training needed to successfully implement an image classifier. So, what is the difference between education and training ? Well, education provides largely theoretical knowledge. It is important to get that knowledge, but it is useless without good training. During training, you learn specific skills and apply the theoretical knowledge to the real world.
Image Classification Pipeline
If you have not looked at my previous post on image classification, I encourage you to do so. In that post, a pipeline involved in most traditional computer vision image classification algorithms is described.
The image above shows that pipeline. In this post, we will use Histogram of Oriented Gradients as the feature descriptor and Support Vector Machine (SVM) as the machine learning algorithm for classification.
Optical Character Recognition (OCR) example using OpenCV (C++ / Python)
I wanted to share an example with code to demonstrate Image Classification using HOG + SVM. At the same time, I wanted to keep things as simple as possible so that we do not need much in addition to HOG and SVM. The inspiration and data for this post comes from the OpenCV tutorial here.
The original tutorial is in Python only, and for some strange reason implements it’s own simple HOG descriptor. We replaced their homegrown HOG with OpenCV’s HOG descriptor.
Digits dataset for OCR
We are going to use the above image as our dataset that comes with OpenCV samples. It contains 5000 images in all — 500 images of each digit. Each image is 20×20 grayscale with a black background. 4500 of these digits will be used for training and the remaining 500 will be used for testing the performance of the algorithm. You can click on the image above to enlarge.
Let us go through the steps needed to build and test a classifier.
Step 1 : Deskewing (Preprocessing)
People often think of a learning algorithm as a block box. Input an image at one end and out comes the result at the other end. In reality, you can assist the algorithm a bit and notice huge gains in performance. For example, if you are building a face recognition system, aligning the images to a reference face often leads to a quite substantial improvement in performance. A typical alignment operation uses a facial feature detector to align the eyes in every image.
Aligning digits before building a classifier similarly produces superior results. In the case of faces, aligment is rather obvious — you can apply a similarity transformation to an image of a face to align the two corners of the eyes to the two corners of a reference face.
In the case of handwritten digits, we do not have obvious features like the corners of the eyes we can use for alignment. However, an obvious variation in writing among people is the slant of their writing. Some writers have a right or forward slant where the digits are slanted forward, some have a backward or left slant, and some have no slant at all. We can help the algorithm quite a bit by fixing this vertical slant so it does not have to learn this variation of the digits. The image on the left shows the original digit in the first column and it’s deskewed (fixed) version.
This deskewing of simple grayscale images can be achieved using image moments. OpenCV has an implementation of moments and it comes in handy while calculating useful information like centroid, area, skewness of simple images with black backgrounds.
It turns out that a measure of the skewness is the given by the ratio of the two central moments ( mu11 / mu02 ). The skewness thus calculated can be used in calculating an affine transform that deskews the image.
The code for deskewing is shared below.
Python
def deskew(img):
m = cv2.moments(img)
if abs(m['mu02']) < 1e-2:
# no deskewing needed.
return img.copy()
# Calculate skew based on central momemts.
skew = m['mu11']/m['mu02']
# Calculate affine transform to correct skewness.
M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
# Apply affine transform
img = cv2.warpAffine(img, M, (SZ, SZ), flags=cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR)
return img
C++
Mat deskew(Mat& img){
Moments m = moments(img);
if(abs(m.mu02) < 1e-2){
return img.clone();
}
float skew = m.mu11/m.mu02;
Mat warpMat = (Mat_<float>(2,3) << 1, skew, -0.5*SZ*skew, 0, 1, 0);
Mat imgOut = Mat::zeros(img.rows, img.cols, img.type());
warpAffine(img, imgOut, warpMat, imgOut.size(),affineFlags);
return imgOut;
}
Step 2 : Calculate the Histogram of Oriented Gradients (HOG) descriptor
In this step, we will convert the grayscale image to a feature vector using the HOG feature descriptor. In my previous post, I had explained the HOG descriptor in great detail.
When I was in grad school, I found a huge gap between theory and practice. Acquiring the knowledge was easy. I could read papers and books. If I did not understand the concept or the math, I read more papers and books. That was the easy part. The hard part of putting that knowledge into practice. Part of the reason was that a lot of these algorithms worked after tedious handtuning and it was not obvious how to set the right parameters. For example, in Harris corner detector, why is the free parameter k set to 0.04 ? Why not 1 or 2 or 0.34212 instead? Why is 42 the answer to life, universe, and everything?
As I got more real world experience, I realized that in some cases you can make an educated guess but in other cases, nobody knows why. People often do a parameter sweep — they change different parameters in a principled way to see what produces the best result. Sometimes, the best parameters have an intuitive explanation and sometimes they don’t.
Keeping that in mind, let’s see what parameters were chosen for our HOG descriptor. We will also try to explain why they made sense, but instead of a rigorous proof, I will offer vigorous handwaving!
C++
HOGDescriptor hog(
Size(20,20), //winSize
Size(10,10), //blocksize
Size(5,5), //blockStride,
Size(10,10), //cellSize,
9, //nbins,
1, //derivAper,
-1, //winSigma,
0, //histogramNormType,
0.2, //L2HysThresh,
1,//gammal correction,
64,//nlevels=64
1);//Use signed gradients
Python
winSize = (20,20)
blockSize = (10,10)
blockStride = (5,5)
cellSize = (10,10)
nbins = 9
derivAperture = 1
winSigma = -1.
histogramNormType = 0
L2HysThreshold = 0.2
gammaCorrection = 1
nlevels = 64
signedGradients = True
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,
cellSize,nbins,derivAperture,
winSigma,histogramNormType,L2HysThreshold,
gammaCorrection,nlevels, useSignedGradients)
I am not going to describe derivAperture, winSigma, histogramNormType, L2HysThreshold, gammaCorrection and nlevels because I have never had to change these parameters while using the HOG descriptor. Unless you have carefully read the original HOG paper, I would recommend you go with the default values. Let’s explore the choice of other parameters.
winSize: This parameter is set to 20×20 because the size of the digit images in our dataset is 20×20 and we want to calculate one descriptor for the entire image.
cellSize: Our digits are 20×20 grayscale images. In other words, our image is represented by 20×20 = 400 numbers.The size of descriptor typically is much smaller than the number of pixels in an image. The cellSize is chosen based on the scale of the features important to do the classification. A very small cellSize would blow up the size of the feature vector and a very large one may not capture relevant information. You should test this yourself using the code shared in this post. We have chosen the cellSize of 10×10 in this tutorial. Could we have chosen 8 ? Yup, that would have worked too.
blockSize: The notion of blocks exist to tackle illumination variation. A large block size makes local changes less significant while a smaller block size weights local changes more. Typically blockSize is set to 2 x cellSize, but in our example of digits classification, illumination does not present much of a challenge. In my experiments, a blockSize of 10×10 gave the best results.
blockStride: The blockStride determines the overlap between neighboring blocks and controls the degree of contrast normalization. Typically a blockStride is set to 50% of blockSize.
nbins: nbins sets the number of bins in the histogram of gradients. The authors of the HOG paper had recommended a value of 9 to capture gradients between 0 and 180 degrees in 20 degrees increments. In my experiments, increasing this value to 18 did not produce any better results.
signedGradients: Typically gradients can have any orientation between 0 and 360 degrees. These gradients are referred to as “signed” gradients as opposed to “unsigned” gradients that drop the sign and take values between 0 and 180 degrees. In the original HOG paper, unsigned gradients were used for pedestrian detection. In my experiments, for this problem, signed gradients produced slightly better results.
The HOG descriptor defined above can be used to compute the HOG features of an image using the following code.
C++
// im is of type Mat
vector<float> descriptors;
hog.compute(im,descriptor);
Python
descriptor = hog.compute(im)
The size of this descriptor is 81×1 for the parameters we have chosen.
Step 3: Training a Model ( a.k.a Learning a Classifier )
Until this point, we have deskewed the original image and defined a descriptor for our image. This has allowed us to convert every image in our dataset to a vector of size 81×1.
We are now ready to train a model that will classify the images in our training set. To do this we have chosen Support Vector Machines (SVM) as our classification algorithm. While the theory and math behind SVM is involved and beyond the scope of this tutorial, how it works is very intuitive and easy to understand. You can check out my previous post that explains Linear SVMs.
To quickly recap, if you have points in an n-dimensional space and class labels attached to the points, a Linear SVM will divide the space using planes such that different classes are on different sides of the plane. In the figure below, we have two classes represented by red and blue dots. If this data is fed into a Linear SVM, it will easily build a classifier by finding the line that clearly separates the two classes. There are many lines that could have separated this data. SVM chooses the one that is at a maximum distance data points of either class.
The two-class example shown in the figure above may appear simple compared to our digits classification problem, but mathematically they are very similar. Instead of being points in a 2D space, our images descriptors are points in an 81-dimensional space because they are represented by an 81×1 vector. The class labels attached to these points are the digits contained in the image, i.e. 0, 1, 2, … 9. Instead of lines in 2D, the SVM will find hyperplanes in a high dimensional space to do the classification.
SVM Parameter C
One of the two common parameters you need to know about while training an SVM is called C. Real world data is not as clean as shown above. Sometimes the training data may have mislabeled examples. At other times, one example of a set may be too close in appearance to another example. E.g. a handwritten digit 2 may look like a 3.
In the animation below we have created this scenario. Notice, the blue dot is too close to the red cluster. When the default value of C = 1 is chosen, the blue dot is misclassified. Choosing the value of 100 for C classifies it correctly.
But now the decision boundary represented by the black line is too close to one of the classes. Would you rather choose C to be 1 where one data point is misclassified, but the separation between the classes is much better ( minus the one data point )? The parameter C allows you to control this tradeoff.
So, how do you choose C? We choose the C that provides the best classification on a held out test set. The images in this set were not used in training.
SVM Parameter Gamma : Non-Linear SVM
Did you notice, I sneaked in the word “Linear” a few times? In classification tasks, a dataset consisting of many classes is called linearly separable if the space containing the data can be partitioned using planes ( or lines in 2D ) to separate the classes.
What if the data is not linearly separable? The figure below shows two classes using red and blue dots that are not linearly separable. You cannot draw a line on the plane to separate the two classes. A good classifier, represented using the black line, is more of a circle.
In real life, data is messy and not linearly separable.
Can we still use SVMs? The answer is YES!
To accomplish this, you use a technique called the Kernel Trick. It is a neat trick that transforms non-linearly separable data into a linearly separable one. In our example, the red and blue dots lie on a 2D plane. Let us add a third dimension to all data points using the following equation.
If you ever hear people using the fancy term Radial Basis Function (RBF) with a Gaussian Kernel, they are simply talking about the above equation. RBF is simply a real-valued function that depends only on the distance from the origin ( i.e. depends only on ). The Gaussian Kernel refers to the Gaussian form of the above equation. More generally, an RBF can have different kinds of kernels. You can see some of them here.
So, we just cooked up a third dimension based on data in the other two dimensions. The figure below shows this three-dimensional (x, y, z) data. We can see it is separable by the plane containing the black circle!
The parameter Gamma ( ) controls the stretching of data in the third dimension. It helps in classification but it also distorts the data. Like Goldilocks, you have to choose this parameter to be “just right”. It is one of the two important parameters people choose while training an SVM.
Equipped with this knowledge, we are now ready to train an SVM using OpenCV.
Training and Testing an SVM using OpenCV
Under the hood, OpenCV uses LIBSVM. SVM in OpenCV 2.4.x still uses the C API. Fortunately, starting 3.x, OpenCV now uses the much nicer C++ API. Here is how you set up SVM using OpenCV in C++ and Python.
C++
// Set up SVM for OpenCV 3
Ptr<SVM> svm = SVM::create();
// Set SVM type
svm->setType(SVM::C_SVC);
// Set SVM Kernel to Radial Basis Function (RBF)
svm->setKernel(SVM::RBF);
// Set parameter C
svm->setC(12.5);
// Set parameter Gamma
svm->setGamma(0.50625);
// Train SVM on training data
Ptr<TrainData> td = TrainData::create(trainMat, ROW_SAMPLE, trainLabels);
svm->train(td);
// Save trained model
svm->save("digits_svm_model.yml");
// Test on a held out test set
svm->predict(testMat, testResponse);
Python
# Set up SVM for OpenCV 3
svm = cv2.ml.SVM_create()
# Set SVM type
svm.setType(cv2.ml.SVM_C_SVC)
# Set SVM Kernel to Radial Basis Function (RBF)
svm.setKernel(cv2.ml.SVM_RBF)
# Set parameter C
svm.setC(C)
# Set parameter Gamma
svm.setGamma(gamma)
# Train SVM on training data
svm.train(trainData, cv2.ml.ROW_SAMPLE, trainLabels)
# Save trained model
svm->save("digits_svm_model.yml");
# Test on a held out test set
testResponse = svm.predict(testData)[1].ravel()
Auto Training SVM
As you can imagine, it can be very time consuming to select the right SVM parameters C and Gamma. Fortunately, OpenCV 3.x C++ API provides a function that automatically does this hyperparameter optimization for you and provides the best C and Gamma values. In the code above, you can change svm->train(td) to the following
svm->trainAuto(td);
This training can take a very long time ( say 5x more than svm->train ) because it is essentially training multiple times.
OpenCV SVM bugs
We encountered two bugs while working with OpenCV SVM. The first one is confirmed, but the other two are not.
- SVM model won’t load in Python API. The trained SVM model you just saved won’t load if you are using Python! Is the bug fix coming ? Nope! Check it out here
- trainAuto does not appear to be exposed via the Python API.
- SVM with RBF kernel does not work in iOS / Android. I would be happy to be proven wrong, but on mobile platforms ( iOS / Android ), we have not been able to use the SVM trained with RBF kernel. The SVM response is always the same. Linear SVM models work just fine.
Results
After training and some hyperparameter optimization, we hit 98.6% on digits classification! Not, bad for just a few seconds of training.
Out of the 500 images in the training set, 7 were misclassified. The images and their misclassified labels are shown below. Like a father looking at his kid’s mistake, I would say these mistakes are understandable.
Thanks for this great article.
Thanks!
thank for your mail and article.
Thanks for the kind words!
Hi, i’m new in this topics. I got an error when i tried to use HOGDescriptor.
I tried to use the code posted above but i got the following error:
“error C2661: ‘cv::HOGDescriptor::HOGDescriptor’ : no overloaded function takes 12 arguments”
I hope you could help me
Has anyone had any luck with this program using Visual Studio 2015, OpenCV 3.2 and the precompiled OpenCV libraries? I am getting the following exception:
Exception thrown at 0x000007FEFD70A06D in HOGSVM.exe: Microsoft C++ exception: cv::Exception at memory location 0x000000000022EA80.
At the line hog.compute(deskewedtrainCells[y], descriptors); in the method CreateTrainTestHOG.
Thanks,
Doug
HI Doug have you found what was causing this error, i have the exact same issue
using VS 2015 and opencv 3.2
We found the problem and fixed it. Please take an update.
Hi Satya, I download your code from this page today and I have still this problem. Which update should be download ? thank for you advice ..
Hello,
I have same problem… How do you fixed this error?
Hi Doug i have the exact same issue than you do, if i make it work i keep you posted
please do the same
daniel,
That would be great! I have not had any luck at all with this problem.
Thanks!
Doug
I got it!
the block size ,blockstride and cell size has to be of a power of two
2,4,8,16 etc.. otherwise it pop a failed
i did a 98.2% with the setting on the picture https://uploads.disquscdn.com/images/9d2c28f2a312f6585565dba6577c23e3534d6f3c599ead0098c4c733e8ebe9ad.jpg
play with the numbers to see if you can get better result
have a great day
Nice work, Daniel!! I made the changes in mine and now have a completely different assertion 🙁 https://uploads.disquscdn.com/images/5671fcfaeb0ff90254da5d5f36f47ae4b7807ae2c8cbf1e9c84c0de469c7eb6d.png
If I click Retry and the break I am in delete_scalar.cpp at
void __CRTDECL operator delete(void* const block) noexcept
{
#ifdef _DEBUG
_free_dbg(block, _UNKNOWN_BLOCK);
#else
free(block);
#endif
}
The Call Stack gives me:
> HOGSVM.exe!operator delete(void * block) Line 21 C++
HOGSVM.exe!std::_Deallocate(void * _Ptr, unsigned __int64 _Count, unsigned __int64 _Sz) Line 133 C++
HOGSVM.exe!std::allocator::deallocate(float * _Ptr, unsigned __int64 _Count) Line 721 C++
HOGSVM.exe!std::_Wrap_alloc<std::allocator >::deallocate(float * _Ptr, unsigned __int64 _Count) Line 988 C++
HOGSVM.exe!std::vector<float,std::allocator >::_Tidy() Line 1643 C++
HOGSVM.exe!std::vector<float,std::allocator >::~vector<float,std::allocator >() Line 976 C++
HOGSVM.exe!CreateTrainTestHOG(std::vector<std::vector<float,std::allocator >,std::allocator<std::vector<float,std::allocator > > > & trainHOG, std::vector<std::vector<float,std::allocator >,std::allocator<std::vector<float,std::allocator > > > & testHOG, std::vector<cv::Mat,std::allocator > & deskewedtrainCells, std::vector<cv::Mat,std::allocator > & deskewedtestCells) Line 130 C++
HOGSVM.exe!main() Line 225 C++
HOGSVM.exe!invoke_main() Line 65 C++
HOGSVM.exe!__scrt_common_main_seh() Line 253 C++
HOGSVM.exe!__scrt_common_main() Line 296 C++
HOGSVM.exe!mainCRTStartup() Line 17 C++
kernel32.dll!00000000777c59cd() Unknown
ntdll.dll!00000000779fa561() Unknown
Unfortunately I am under a major deadline so I won’t have time to look at this.
One question, did you use the prebuilt binaries from OpenCV or did you build 3.2 locally?
Thanks,
Doug
HI
well i never face this one before
check out this link
http://stackoverflow.com/questions/35310117/debug-assertion-failed-expression-acrt-first-block-header
good luck
What opencv version should i need to install??
Thank you for your great tutorial Mr. Mallick . I am just wondering I subscribed this article but I don’t received the code. What I can do if I want to try your code? Thanks!
Not sure what went wrong. if you have still not received the welcome email, please send me an email at [email protected] .
Thanks for sharing this great tutorial.
How can we take use of this trained svm to apply to
HOGDescriptor::setSVMDetector and
HOGDescriptor::detectMultiScale ?
“After training and some hyperparameter optimization,…”
How much “hyperparameter optimization” did you do? How many times did you experiment and view the result on the 500 test observations? What was the test performance the first time you did this?
Hi Guys,
I’m getting the following error (same as Doug) when running train_digits.cpp:
Image Count : 5000
OpenCV Error: Assertion failed ((n & (n – 1)) == 0) in cv::alignSize, file C:buildmaster_winpack-build-win64-vc14opencvmodulesc
oreincludeopencv2/core/utility.hpp, line 438
Any solutions?
David
change your hog’s blocksize, blockStride and cellSize to a number that is power of 2.
i.e.
cv::HOGDescriptor hog(
cv::Size(20, 20), //winSize
cv::Size(16, 16), //blocksize
cv::Size(4, 4), //blockStride,
cv::Size(16, 16), //cellSize,
9, //nbins,
1, //derivAper,
-1, //winSigma,
0, //histogramNormType,
0.2, //L2HysThresh,
0,//gammal correction,
64,//nlevels=64
1);
I think it is working now – I get 88.4% is that right? What could I do to improve on this figure?
David
Play with the parameters. We got 98.6% or so.
Hi Satya,
Are these the only real parameters we can play with?
svm->setGamma(0.50625);
svm->setC(12.5);
David
Hi Satya,
I managed to get 98.2% – how did you get 98.6%? I used:
cv::HOGDescriptor hog(
cv::Size(20, 20), //winSize
cv::Size(8, 8), //blocksize
cv::Size(4, 4), //blockStride,
cv::Size(4, 4), //cellSize,
9, //nbins,
1, //derivAper,
-1, //winSigma,
0, //histogramNormType,
0.2, //L2HysThresh,
0,//gammal correction,
64,//nlevels=64
1);
David
They are the same order. Unfortunately, because the dataset is small, ordering / shuffling slightly differently will result in slightly different accuracy numbers. Just confirmed that your parameters are exactly the same as mine.
Thanks Satya.
How much did you experiment with “hyperparameters”? I wonder whether you haven’t data snooped by doing this too much and (accidentally) inflated your test performance.
Didn’t do much hyperparameter optimization, but the results are no way rigorously unbiased because this is a proof of concept tutorial not a academic paper :). Ideally, the data should be split into 3 parts — training set, validation set and test set, and results reported on the test set while the parameters should be tuned on the validation set.
Well after trying all the suggestions here, I cannot make this program run. It faults every time. I have tried OpenCV3-1 and 3-2 and I use Visual Studio 2015
If anyone has a working solution would you mind zipping it up. I would like to try a known working solution and see if that gets me going.
I can also reinstall VS2013 but would rather njot if I can avoid it.
Thanks!
Doug
Thanks for useful posting Satya.
A general question: what would happen if we only do deskewing without the following hogging, what is your feeling? Will it improve the accuracy significantly too? I have experimented a little bit and without hogging deskewing seems not to be useful namely..
Hi Emre,
Deskewing is just for pre-processing. The main feature is HOG. I suspect you will see poor results if you did not use HOG.
Satya
I’ve subscribed already. How do I find the entire C++ code?
After you subscribe and confirm your email, you will receive a welcome email with the link to all code in this blog. If you have not received the welcome email, please send me an email at [email protected]
I might have accidentally deleted. Could you send me another? Thank you. I always enjoy your clear explanations.
Sure. Send me an email.
I would like access to your HoG SVM C++ code. Thank you.
@Satya Mallick Great tutorial!
Im trying to use the trained data model with digits_svm_model.yml file on a new handwrite samples.
code:
deskew(deskewedImg); .// deskew new image, contains handwrite image, number 2
vector descriptors;
hog.compute(deskewedImg,descriptors);
int descriptor_size2 = trainHOG[0].size();
std::vector<std::vector > testHOG2;
testHOG2.push_back(descriptors);
Mat testMat2(trainHOG.size(),descriptor_size2,CV_32FC1);
for(int i = 0;i<testHOG2.size();i++){
for(int j = 0;j<descriptor_size;j++){
testMat2.at(i,j) = testHOG[i][j];
}
}
float result = svm->predict(testMat2,testResponse);
how can I get the result of the classification from testResponse Mat (whatever the number is 2 or other number from 1 to 9? ) I tried with the code below, but Im getting 0 for all the images I tried?
int TrainDigit::SVMevaluate(Mat &testResponse,float &count, float &accuracy,vector &testLabels){
for(int i=0;i<testResponse.rows;i++)
{
if(testResponse.at(i,0) == testLabels[i]){
return i;
}
}
return -1;
}
Many Thanks!!
HI Doug..
I want to ask how to get svm image from car?
I do not know how to put a cat image into a car picture
I’m still new to python opencv
Thank u very much
https://uploads.disquscdn.com/images/5f7c593e9a826c3f84f220b716e0f89f0380892891dd7106c835938a614d27ed.png
Thanks very much for this excellent HOG series tutorials, Satya.
Python code is working fine.
Thanks a bunch!
Hi Satya Mallick,
Thank you for this great tutorial 🙂 Everything works great till I want to use predict function.
When I use a predict function on a sample img I get this error:
opencv/modules/ml/src/svm.cpp:1930: error: (-215) samples.cols == var_count && samples.type() == CV_32F in function predict
There are few suggestions on the stackoverflow but reshaping the sample img didn’t help.
I would greatly appreciate any suggestions.
Best,
Michal
Do you have this code in C++ ? When I subscribe and get the link it seems to only be Python.
Not sure why you did not find the code, but here it is
https://github.com/spmallick/learnopencv/blob/master/digits-classification/train_digits.cpp
Thanks for your sharing
basically, SVM is used for binary classifier,
how do OpenCV use it for predict 0 ~ 9
Thank you
Please check the code.
How can i use SVM for multiple classes classification in openCV3.3 c++?
thanks in advace.
Did the code shared on this post not work on OpenCV 3.3?
thank you so much.
i can’t express how much your word meant to me. especially the part you explain the problem between theory and practice, and how parameter is normally tuned for better result
Thanks for the kind words, Mie.
Hi Satya,
Can I change image?,how to do
thank you so much.
Hi Satya,
great post (as are the others in this series…)
– The HOG+SVM appears more complex than needed for this problem. Does a well-trained multi-layer CNN take care of the feature selection de-skewing and classification as well? The pre-processing steps de-skew + HOG look like steps needed because the SVM is a relatively simple classifier
– In this example the digits are nicely separated, and you can detect each single digit using the simpleBlob routine. But in practice you will do number recognition, ie you have to identify “17” from “1” and “7”. I’m doing simpleBlob detection followed by a rectangle detection to get the numbers. And then… I’m stuck. Any suggestions?
– CNN need a lot of data, not just 500 samples. So is not suitable at this situation.
Hi,
To start. Thank you for sharing the code and the great explanation!
I managed to get it working, almost. Whenever I run the code it shows the image count and descriptor size really quick in the terminal and then it shuts down the terminal and leaves debug mode. I can’t see the accuracy for example. My output is that different threads have exited with code (0).
Could someone please help me why it goes out of debug mode?
Thank you
Try using command waitKey() for C++
Hi Satya,
I have some questions about this thema. I am aiming to detect multiple objects in an image with HOG. However, their template’s sizes are rather different (e.g. human and car).
1) How can you train a Multiclass SVM with such templates?
2) How does one normally does this? n binary one-vs-all SVMs for each class reshaping the samples of the other classes to match the one of the wanted class?
3) If for the training it is required to reshape the templates to have constant dimensionality, how do you parse the model on an image to detect the instances and assure that the negative samples will be detected as negative?
4) In the paper of FE-CCM from Li and Saxena, when talking about the dimensionality of the HOG Freature vector, they cite that K (i.e. | Features | ) depends on the number of scales to be considered and the size of the object template. Do you have any idea what is meant there with “Number of Scales to be considered”? How this affects the feature vector’s dimensionality? Does this mean that they train multiple templates and concatenate them posteriorly?
Thank you very much. I hope that you can help me clear my thoughs and that this question gives a little more insight to the readers about more realistic problems.
Thanks for the post!
David
Great tutorial. How can you use this code to see the accuracy for the digits-classification.jpg file in c++
can i use same logic if i have
a leger book with handwritten digits like thousands (1000) and two thousand eight hundred five(2805) or eight hundred nine(809).
i hope my question is clear
Any size I think is applicable.
Hi Satya,
I want to extract a logo from an image. Can you help me how i can do this.
Advanced Thanks
Regarding issue #1 I think it has been fixed https://github.com/opencv/opencv/issues/4969#issuecomment-269432958
This is for anyone disheartened after reading that the SVM with RBF kernel doesn’t work on android. Maybe this was true when this was written. But I have trained a model using OpenCV 2.4.13.7 using RBF to classify letters and digits. It loads and predicts correctly on my android phone.
Is there anyway I could get a copy of this wp topic? Ive been searching on wp.org and cant truly come across something I like but this will be perfect for me. Thanks in advance! Have you considered promoting your blog? add it to SEO Directory right now 🙂