• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Learn OpenCV

OpenCV, PyTorch, Keras, Tensorflow examples and tutorials

  • Home
  • Getting Started
    • Installation
    • PyTorch
    • Keras & Tensorflow
    • Resource Guide
  • Courses
    • Opencv Courses
    • CV4Faces (Old)
  • Resources
  • AI Consulting
  • About

Support Vector Machines (SVM)

Satya Mallick
July 11, 2018 8 Comments
Machine Learning Theory

July 11, 2018 By 8 Comments

Ideas in Machine Learning have a “winner takes all” quality. When an idea takes off, it dominates the field so completely that one tends to believe it is the only idea worth pursuing.

Today, Deep Learning is cool. A few years back learning algorithms like Random Forests and Support Vector Machines (SVMs) were just as cool.

These traditional methods have some benefits over Deep Learning in certain application domains. They sometimes need less data to train on and it takes minutes ( instead of hours or days ) to train. Authors of this paper discovered,

“For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method.”

Faster training time means you can perform more experiments and bring a product to market faster. A good machine learning engineer is not married to a specific technique. They learn a bag of tools and apply the right tool for the right problem.

In this post, we will learn a math-free intuition behind linear and non-linear Support Vector Machines (SVMs).

Looking for an application of SVM in Computer Vision? Check out our post on Handwritten Digits Classification

What are Support Vector Machines (SVMs)?

The easiest way to understand SVM is using a binary classification problem.

Support Vector Machine
Figure 1 : Two classes are shown using two different colors. SVM finds the best line that separates the two classes.

In Figure 1, we see data represented as dots on a 2D plane. The data belongs to two different classes indicated by the color of the dots. One way to learn to distinguish between the two classes is to draw a line that partitions the 2D space into two parts. Training the system simply means finding the line. Once you have trained the system (i.e. found the line), you can say if a new data point belongs to the blue or the red class by simply checking on which side of the line it lies.

In Figure 1, it is clear that line L1 is not a good choice because it does not separate the two classes. L2 and L3 both separate the two classes, but intuitively we know L3 is a better choice than L2 because it more cleanly separates the two classes.

How do Support Vector Machines (SVMs) work?

Support Vector Machine (SVM) essentially finds the best line that separates the data in 2D. This line is called the Decision Boundary.

If we had 1D data, we would separate the data using a single threshold value. If we had 3D data, the output of SVM is a plane that separates the two classes. Finally, if the data is more than three dimensions, the decision boundary is a hyperplane which is nothing but a plane in higher dimensions. No, you cannot visualize it, but you get the idea!

Now, let’s see how is line L3 chosen by the SVM.

Support Vectors and Maximum Margin
Figure 2: The points closest to the decision boundary are called support vectors. SVM finds the decision boundary by maximizing its distance from the Support Vectors.

The points closest to the separating hyperplanes are called the Support Vectors. SVM solves an optimization problem such that

  1. Support Vectors have the greatest possible distance from the decision boundary (i.e. separating hyperplane).
  2. The two classes lie on different sides of the hyperplane.

This optimization problem is equivalent to maximizing the Geometric Margin (\gamma) shown in the equation below.

(1)   \begin{align*} \gamma = \min^N_{i=1} \hspace{0.25in} y_i \frac{\bf{w}^T \bf{x_i} + b}{||\bf{w}||} \end{align*}

where \bf{x}_i is a training example, y_i takes two values ( 1 and -1 ) for a binary classifier and the separating hyperplane is parameterized by \bf{w} and {b}. In our 2D example, \bf{x}_i is simply the coordinates of the 2D points, y_i is the 1 for blue and -1 for red dots, and the parameters \bf{w} and {b} are related to the slope and intercept of the separating line.

SVM Parameter C

Now, you may be thinking the toy example I picked was too easy and real data is noisy and almost never so neatly separable using a hyperplane.

In such cases, SVM still finds the best hyperplane by solving an optimization problem that tries to increase the distance of the hyperplane from the two classes while trying to make sure many training examples are classified properly.

This tradeoff is controlled by a parameter called C. When the value of C is small, a large margin hyperplane is chosen at the expense of a greater number of misclassifications. Conversely, when C is large, a smaller margin hyperplane is chosen that tries to classify many more examples correctly. Figure 3, graphically depicts this tradeoff.

SVM Parameter C
Figure 3. SVM Parameter C

Note : The line corresponding to C = 100 is not necessarily a good choice. This is because the lone blue point may be an outlier.

SVM Parameter Gamma (\gamma)

What if the data is not separable by a hyperplane? For example, in Figure 4, the two classes represented by the red and blue dots are not linearly separable. The decision boundary shown in black is actually circular.

Non-Linearly Separable Data
Figure 4: Non-Linearly Separable Data

In such a case, we use the Kernel Trick where we add a new dimension to existing data and if we are lucky, in the new space, the data is linearly separable. Let’s look at the Kernel Trick using an example.

In Figure 5, we have added a third dimension (z) to the data where,

    \[z = e^{-\gamma(x^2+y^2)}\]

The above expression is called a Gaussian Radial Basis Function or a Radial Basis Function with a Gaussian kernel. We can see the new 3D data is separable by the plane containing the black circle! The parameter \gamma controls the amount of stretching in the z direction.

SVM Kernel Trick
Figure 5: Using Kernel Trick to make data linearly separable.

In our next post in this sequence, we will learn how to use SVM in Python and C++ applications.

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please subscribe to our newsletter. You will also receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now

Tags: classification Kernel Trick Radial Basis Function Support Vector Machine SVM

Filed Under: Machine Learning, Theory

About

I am an entrepreneur with a love for Computer Vision and Machine Learning with a dozen years of experience (and a Ph.D.) in the field.

In 2007, right after finishing my Ph.D., I co-founded TAAZ Inc. with my advisor Dr. David Kriegman and Kevin Barnes. The scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products. Read More…

Getting Started

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

Resources

Download Code (C++ / Python)

ENROLL IN OFFICIAL OPENCV COURSES

I've partnered with OpenCV.org to bring you official courses in Computer Vision, Machine Learning, and AI.
Learn More

Recent Posts

  • Making A Low-Cost Stereo Camera Using OpenCV
  • Optical Flow in OpenCV (C++/Python)
  • Introduction to Epipolar Geometry and Stereo Vision
  • Depth Estimation using Stereo matching
  • Classification with Localization: Convert any Keras Classifier to a Detector

Disclaimer

All views expressed on this site are my own and do not represent the opinions of OpenCV.org or any entity whatsoever with which I have been, am now, or will be affiliated.

GETTING STARTED

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

COURSES

  • Opencv Courses
  • CV4Faces (Old)

COPYRIGHT © 2020 - BIG VISION LLC

Privacy Policy | Terms & Conditions

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.AcceptPrivacy policy