This post explains the implementation of Support Vector Machines (SVMs) using Scikit-Learn library in Python. We had discussed the math-less details of SVMs in the earlier post.

In this post, we will show the working of SVMs for three different type of datasets:

**Linearly Separable data with no noise****Linearly Separable data with added noise****Non-linear separable data with added noise**

## Prerequisites

Before we begin, we need to install **sklearn** and **matplotlib** modules. This can be done using pip.

```
pip install -U scikit-learn
pip install -U matplotlib
```

We first import **matplotlib.pyplot** for plotting graphs. We also need **svm** imported from **sklearn**. Finally, from **sklearn.model_selection** we need **train_test_split** to randomly split data into training and test sets, and **GridSearchCV** for searching the best parameter for our classifier. The code below shows the imports.

**Download Code**To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

```
import sys, os
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.model_selection import train_test_split, GridSearchCV
```

## Linearly separable data with no noise

Let’s first look at the simplest cases where the data is cleanly separable linearly. In the 2D case, it simply means we can find a line that separates the data. In the 3D case, it will be a plane. For higher dimensions, it is simply a plane.

**Computer Vision**,

**Machine Learning**, and

**AI**! Sign up now and take your skills to the next level!

Let’s see how we can use a simple binary SVM classifier based on the data above.

If you have downloaded the code, here are the steps for building a binary classifier

1. **Prepare data**: We read the data from the files *points_class_0.txt* and *points_class_1.txt*. These files simply have x and y coordinates of points — one per line. The points in* points_class_0.txt* are assinged the label 0 and the points in *points_class_1.txt* are assigned the label 1. The dataset is then split into training (80%) and test (20%) sets. This dataset is shown in Figure 1.

```
# Read data
x, labels = read_data("points_class_0.txt", "points_class_1.txt")
# Split data to train and test on 80-20 ratio
X_train, X_test, y_train, y_test = train_test_split(x, labels, test_size = 0.2, random_state=0)
# Plot traning and test data
plot_data(X_train, y_train, X_test, y_test)
```

2. **Create an instance of a Linear SVM classifier**: Next we create an instance of the Linear SVM classifier from **scikit-learn**. We use the default parameters because the problem is easy to solve and we expect the default parameters to work just fine. We only specify the SVM be linear.

```
# Create a linear SVM classifier
clf = svm.SVC(kernel='linear')
```

3. **Train a Linear SVM classifier**: Next we train a Linear SVM. In other words, based on the training data, we find the line that separates the two classes. This is simply done using the fit method of the SVM class.

```
# Train classifier
clf.fit(X_train, y_train)
# Plot decision function on training and test data
plot_decision_function(X_train, y_train, X_test, y_test, clf)
```

Next, we plot the decision boundary and support vectors. The decision boundary is estimated based on only the traning data. Given a new data point (say from the test set), we simply need to check which side of the line the point lies to classify it as 0 ( red ) or 1 (blue).

**Test a Linear SVM classifier**:

To predict the class of a new point ( or points ) we can simply use the **predict** method of the SVM class. To obtain the accuracy on the test set, we can use the **score** method.

```
# Make predictions on unseen test data
clf_predictions = clf.predict(X_test)
print("Accuracy: {}%".format(clf.score(X_test, y_test) * 100 ))
```

In this easy example, the accuracy is 100%.

## Linearly separable data with noise

Let’s look at a slightly more complicated case shown in Figure 3 where it is not possible to linearly separate the data, but a linear classifier still makes sense. Note that no matter what you do some red points and some blue points will be on the wrong side of the line.

The question now is which line to choose? SVM provides you with parameter called C that you can set while training. In scikit-learn, this can be done using the following lines of code

```
# Create a linear SVM classifier with C = 1
clf = svm.SVC(kernel='linear', C=1)
```

If you set C to be a low value (say 1), the SVM classifier will choose a large margin decision boundary at the expense of larger number of misclassifications. When C is set to a high value (say 100), the classifier will choose a low margin decision boundary and try to minimize the misclassifications. This is shown in Figure 4. The margin is the shown using dotted lines — the larger the space between the dotted lines, the larger is the margin.

You may be tempted to ask which value of C is better. The answer depends on how much noise you think there is in your data. If you think the data is very noisy, you want C to be small. On the other hand, if you think the data is less noisy, you should choose C to be large.

## Non-Linearly separable data with noise

Finally, let’s look at data that is impossible to partition using a line.

Is SVM useless in such cases? Fortunately, the answer is no. We can use the Kernel Trick explained in our previous article. In scikit-learn we can specify the **kernel** type while instantiating the SVM class.

```
# Create SVM classifier based on RBF kernel.
clf = svm.SVC(kernel='rbf', C = 10.0, gamma=0.1)
```

In the above example, we are using the Radial Basis Fucttion expalined in our previous post with parameter **gamma** set to 0.1. As you can see in Figure 6, the SVM with an RBF kernel produces a ring shaped decision boundary instead of a line.

Looking at Figure 6 you may be tempted to think that using some other value for C and gamma, we may be able to come up with a better decision boundary. Your intuition is right. To find the best parameters, we need to do a parameter sweep by changing values of C and gamma and picking the one that works best. Let’s see how parameters tuning in done using GridSearchCV.

## Parameter Tuning using GridSearchCV

The module **sklearn.model_selection** allows us to do a grid search over parameters using **GridSearchCV**. All we need to do is specify which parameters we want to vary and by what value. In the following example, the parameters C and gamma are varied. Every combination of C and gamma is tried and the best one is chosen based. The best estimator can be accessed using **clf.best_estimator_**.

```
# Grid Search
# Parameter Grid
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001, 0.00001, 10]}
# Make grid search classifier
clf_grid = GridSearchCV(svm.SVC(), param_grid, verbose=1)
# Train the classifier
clf_grid.fit(X_train, y_train)
# clf = grid.best_estimator_()
print("Best Parameters:\n", clf_grid.best_params_)
print("Best Estimators:\n", clf_grid.best_estimator_)
```

By default, GridSearchCV performs 3-fold cross-validation. In other words, it divides the data into 3 parts and uses two parts for training, and one part for determining accuracy. This is done three times so each of the three parts is in the training set twice and validation set once. The accuracy for a given C and gamma is the average accuracy during 3-fold cross-validation.

The best parameters ( C = 1 and gamma = 0.01 ) of classifier shown in Figure 7 were found using GridSearchCV. Our intuition also confirms this shape of the decision boundary looks better than the one manually chosen.