SVMs for Linearly Separable Classes
In the twoclass classification problem, we are given an input dataset containing two classes of data and an indicator function to map the data into classes. Say from the above figure, class 1 as C1 contains a set of positive samples whose indicator function value is +1, and class 2 as C2 contains a set of negative samples whose indicator function value is 1.
Then we try to find a linearly separable function, a hyperplane which is defined as
f(W, X, w_{o}) = W.X+w_{o} ,
where W = set of weights defined on the input
X = input values
w_{o }= bias or a constant.
Let d+ and d are the perpendicular distances from separating hyperplane (margin lines) which are closed to data points. The distance of margin is defined as
d+ = W.X_{i} + w_{o } >= 1, for di = +1 for C1
d = W.X_{i} + w_{o } <= 1, for di = 1 for C2
Letâ€™s derive the margin:
We have SV1 (say X_{+}) and SV2 (say X_{Ď€}) as support vectors. Then, we have
W.X_{+} + w_{o } = 1 for C1 and similarly
W.X_{Ď€} + w_{o } = 0 for C2.
Then from the above equations, we have W(X_{+}  X_{Ď€}) = 1
This above equation can be written as W( X_{+}  X_{Ď€}. W / W) = 1 => X_{+}  X_{Ď€}. W^{2} / W = 1
At last, we get X_{+}  X_{Ď€} . W = 1 => X_{+}  X_{Ď€} = 1 / W
Therefore, we can tell that the perpendicular distance will be concluded as d+ = d = 1 / W
Therefore total margin can be calculated as M = 2 / W.
This can be diagrammatically shown below.
That's it. We find a hyperplane that separates twoclass data points with a large margin.
Letâ€™s have a look at the linearly inseparable case.
SVMs for Linearly Inseparable Classes
In the case of the nonseparable dataset, the points of opposite classes may overlap. In this case, the constraints di(W.X + w_{o}) >= 1, for i = 1,2,.... cannot be satisfied for all data points.
Or we can also say that when the data points are scattered across the input space, identifying the margin line takes more time and more repetitions. Then, those data are called linearly nonseparable data. To separate these data, we need to identify a soft margin that separates data in its most accurate way.
Here we actually introduce a new slack variable to the class equations. Then we have
W.X_{i} + w_{o } >= 1  đť›Ź, di = +1
W.X_{i} + w_{o } <= 1 + đť›Ź, di = 1 here the additional đť›Ź variable is called a slack variable, a constant. Here we can say the introduction of the slack variable makes the soft margin classifier loss as a hinge loss.
Linearly Inseparable Data
Then by doing the same process, we did for Linearly separable data to find the total margin, we can get a value of
M = 2( 1  2đť›Ź ) / W
There are more interesting features about SVMs, like using kernels in SVM. This will be a really a broad topic to understand. Basically, these kernel functions are introduced so as to replace the dot function scenarios. Three admissible functions are
We will cover this concept later in our articles.
SVMs using scikit learn Python
Pythonâ€™s scikitlearn library provides three ways of implementing the SVM classifier.
They are
 svm.LinearSVC

svm.SVC >gives an advantage of using kernels.
 linear_model.SGDClassifier
You can go through these concepts in more detail through their official documentation linked above.
Advantages and Applications
 SVMs can also be used for linearly Inseparable data using the kernel support.
 It can also be used in the case of high dimensionality and even when the number of data points is smaller than its dimensionality.
 It can also be used for Clustering(linearly Inseparable), Classification, and Regression problems.
 Works well with even structured and unstructured data like text, images, etc.
 The only disadvantage is it won't work well on large datasets when the dataset has more noise.
Implement SVM in Python
Code: Here, I will use a small example to demonstrate how SVMs are used to classification problems. We have used scikit and few other libraries for this. Letâ€™s get to code:
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
In the above three lines of code, we have defined two NumPy arrays. X has the points and Y has the classes to which these points belong to. Now, let us create our SVM model using sklearn.svm. Here, I choose the linear kernel.
from sklearn.svm import SVC
clf = SVC(kernel='linear')
We now fit out classifier(clf) to the data points we defined.
clf.fit(X, y)
To predict the class of a new dataset
prediction = clf.predict([[0,6]])
This would return us the prediction(a class to which the data belongs). Voila! It is simple to use an SVM for simple classification problems.
Here are some realworld applications of support vector machines listed
1. Face Detection: SVM separates parts of the image as facial and nonfacial and forms a square border around the face.
2. Text and Hypertext Categorisation: SVMs allow text and hypertext categorisation for both inductive and transductive models. They use training data to classify documents into different categories. It categorises on the basis of the score generated and then compares with the threshold value.
3. Classification of images: We have already discussed that SVMs are widely used in image classification problems. It provides better accuracy for image classification and image search in comparison to the formerly used querybased searching approaches.
4. Bioinformatics: SVMs are really popular in medical and bioinformatics and are used in protein classification and cancer classification problems. It is used for identifying the classification of genes, patients on the basis of genes, and other biological problems like skin cancer.
5. Protein fold and remote homology detection: SVM algorithms are also widely used in protein remote homology detection.
6. Handwriting recognition: SVMs are used to recognise handwritten characters and work with a wide variety of languages.
7. Generalised Predictive Control (GPC): Use SVM based GPC to control chaotic dynamics with useful parameters and hyperparameters.
Frequently Asked Questions
What does SVM stand for?
SVM stands for support vector machine. Basically, SVMs or support vector machines are used for both classification and regression tasks. It mainly saves the complexity. It improves the performance of tasks by implementing its kernel trick.
What is an SVM kernel?
An SVM kernel is a trick that is used by the support vector machine algorithm to improve its performance. These kernels are used to transform the input data space into the required form. For example, a kernel may take a low dimensional input space and transform it into a higher dimensional data space.
What is the agenda in the SVM concept?
The objective of the SVM is to perform classification and regression tasks by finding a linear indicator function, a hyperplane that separates two classes.
What are the advantages of using SVM?
SVMs are mainly used to reduce complexity. It can be used for both linearly separable and nonseparable, for both classification and regression, and for structured and unstructured datasets.
Conclusion
So far, we have discussed the concept of SVM and how it works. The math behind this concept. We will learn the concept of kernels in SVM in the upcoming articles. Until then, keep exploring.
You can also consider our Online Coding Courses such as the Machine Learning Course to give your career an edge over others.
Happy Learning!