Support Vector Machine (SVM) : Basic Terminologies
Support Vector Machine (SVM) : Basic Terminologies
Support Vector Machine (SVM) : Basic Terminologies
We know that several learning models are there in machine learning, like supervised learning,
unsupervised learning, semi-supervised learning. Support Vector Machine (SVM) is a supervised
machine learning algorithm used for both classification and regression analysis. It is developed at
AT&T Bell Laboratories by Russian computer scientist Vladimir Vapnik with his colleagues around
1992.
The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space
that can separate the data points in different classes in the feature space. The hyperplane tries that
the margin between the closest points of different classes should be as maximum as possible. The
dimension of the hyperplane depends upon the number of features. If the number of input features
is two, then the hyperplane is just a line. If the number of input features is three, then the hyperplane
becomes a 2-D plane. It becomes difficult to imagine when the number of features exceeds three.
Basic Terminologies:
(1) Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of
different classes in a feature space. In the case of linear classifications, it will be a linear
equation i.e. wx+b = 0.
(2) Support Vectors: Support vectors are the closest data points to the hyperplane, which
makes a critical role in deciding the hyperplane and margin.
(3) Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
(4) Kernel: Kernel is the mathematical function, which is used in SVM to map the original input
data points into high-dimensional feature spaces, so, that the hyperplane can be easily found
out even if the data points are not linearly separable in the original input space. Some of the
common kernel functions are linear, polynomial, radial basis function (RBF), and sigmoid.
(5) Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
(6) Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a
soft margin technique. Each data point has a slack variable introduced by the soft-margin
SVM formulation, which softens the strict margin requirement and permits certain
misclassifications or violations. It discovers a compromise between increasing the margin
and reducing violations.
(7) C: Margin maximization and misclassification fines are balanced by the regularisation
parameter C in SVM. The penalty for going over the margin or misclassifying data items is
decided by it. A stricter penalty is imposed with a greater value of C, which results in a
smaller margin and perhaps fewer misclassifications.
(8) Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently formed by
combining it with the regularisation term.
(9) Dual Problem: A dual Problem of the optimization problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The dual
formulation enables the use of kernel tricks and more effective computing.
Types of SVM:
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be
divided into two main parts:
(1) Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear SVMs are very
suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A hyperplane
that maximizes the margin between the classes is the decision boundary.
(2) Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel functions,
nonlinear SVMs can handle nonlinearly separable data. The original input data is
transformed by these kernel functions into a higher-dimensional feature space, where the
data points can be linearly separated. A linear SVM is used to locate a nonlinear decision
boundary in this modified space.
Advantages of SVM:
(1) Effective in high-dimensional cases.
(2) Its memory is efficient as it uses a subset of training points in the decision function called
support vectors.
(3) Different kernel functions can be specified for the decision functions and its possible to specify
custom kernels.
Applications of SVM:
(1) SVMs can be used to solve various real-world problems: SVMs are helpful in text and
hypertext categorization, as their application can significantly reduce the need for labeled
training instances in both the standard inductive and transductive settings. Some methods
for shallow semantic parsing are based on support vector machines.
(2) Classification of images can also be performed using SVMs. Experimental results show that
SVMs achieve significantly higher search accuracy than traditional query refinement
schemes after just three to four rounds of relevance feedback. This is also true for image
segmentation systems, including those using a modified version SVM that uses the privileged
approach as suggested by Vapnik.
(3) Classification of satellite data like SAR data using supervised SVM.
(4) Hand-written characters can be recognized using SVM.
(5) The SVM algorithm has been widely applied in the biological and other sciences. They have
been used to classify proteins with up to 90% of the compounds classified correctly.
Permutation tests based on SVM weights have been suggested as a mechanism for
interpretation of SVM models. Support vector machine weights have also been used to
interpret SVM models in the past. Posthoc interpretation of support vector machine models
in order to identify features used by the model to make predictions is a relatively new area of
research with special significance in the biological sciences.