1.
Introduction
2.
Linear Classification
2.1.
Linear Discriminant Classifier
2.2.
Naive Bayes
2.3.
Logistic Regression
2.4.
Support Vector Machine (linear kernel)
3.
What is Non-Linear Classification
3.1.
3.2.
Multi-Layer Perceptron (MLP)
3.3.
Decision Tree
3.4.
K-Nearest Neighbours
4.
Linear vs Non-Linear Classification
5.

6.

7.

8.

9.
9.1.
What is the difference between Linear Classification and Non-Linear Classification?
9.2.
Name a few linear classifiers.
9.3.
What are the most popular non-linear classifiers?
10.
Conclusion
Last Updated: Mar 27, 2024

# Linear vs. Non-Linear Classification

## Introduction

We will be studying Linear Classification as well as Non-Linear Classification.

Linear Classification refers to categorizing a set of data points to a discrete class based on a linear combination of its explanatory variables. On the other hand, Non-Linear Classification refers to separating those instances that are not linearly separable.

A linear classifier is often used when classification speed is a priority as linear classifiers are known for being the fastest classifiers especially when the input data is sparse. It classifies large datasets quickly and accurately. On the other hand, Nonlinear classifiers are like detectives who can solve complex cases. It sorts data that doesn't follow straight patterns, like grouping things that don't fit on a simple line.

## Linear Classification

→ Linear Classification refers to categorizing a set of data points into a discrete class based on a linear combination of its explanatory variables.

→ Some of the classifiers that use linear functions to separate classes are Linear Discriminant Classifier, Naive Bayes, Logistic Regression, Perceptron, SVM (linear kernel)

→ In the figure above, we have two classes, namely 'O' and '+.' To differentiate between the two classes, an arbitrary line is drawn, ensuring that both the classes are on distinct sides.

→ Since we can tell one class apart from the other, these classes are called ‘linearly-separable.’

→ However, an infinite number of lines can be drawn to distinguish the two classes.

→ The exact location of this plane/hyperplane depends on the type of the linear classifier.

### Linear Discriminant Classifier

→ It is a dimensionality reduction technique in the domain of Supervised Machine Learning

→  It is crucial in modeling differences between two groups, i.e., classes.

→ It helps project features in a high dimensions space in a lower-dimensional space.

→ Technique - Linear Discriminant Analysis (LDA) is used, which reduced the 2D graph into a 1D graph by creating a new axis. This helps to maximize the distance between the two classes for differentiation.

→ In the above graph, we notice that a new axis is created, which maximizes the distance between the mean of the two classes.

→ As a result, variation within each class is also minimized.

→ However, the problem with LDA is that it would fail in case the means of both the classes are the same. This would mean that we would not be able to generate a new axis for differentiating the two.

### Naive Bayes

→ It is based on the Bayes Theorem and lies in the domain of Supervised Machine Learning.

→ Every feature is considered equal and independent of the others during Classification.

→ Naive Bayes indicates the likelihood of occurrence of an event. It is also known as conditional probability.

A: event 1

B: event 2

P(A|B): Probability of A being true given B is true - posterior probability

P(B|A): Probability of B being true given A is true - the likelihood

P(A): Probability of A being true - prior

P(B): Probability of B being true - marginalization

However, in the case of the Naive Bayes classifier, we are concerned only with the maximum posterior probability, so we ignore the denominator, i.e., the marginal likelihood. Argmax does not depend on the normalization term.

→ The Naive Bayes classifier is based on two essential assumptions:-

(i) Conditional Independence - All features are independent of each other. This implies that one feature does not affect the performance of the other. This is the sole reason behind the ‘Naive’ in ‘Naive Bayes.’

(ii) Feature Importance - All features are equally important. It is essential to know all the features to make good predictions and get the most accurate results.

→ Naive Bayes is classified into three main types: Multinomial Naive Bayes, Bernoulli Naive Bayes, and Gaussian Bayes.

### Logistic Regression

→ It is a very popular supervised machine learning algorithm.

→ The target variable can take only discrete values for a given set of features.

→ The model builds a regression model to predict the probability of a given data entry.

→ Similar to linear regression, logistic regression uses a linear function and, in addition, makes use of the 'sigmoid' function.

→ Logistic regression can be further classified into three categories:-

• Binomial - target variable assumes only two values since binary. Example: ‘0’ or ‘1’.
• Multinomial - target variable assumes >= three unordered values since multinomial. Example: 'Class A,' 'Class B,' and 'Class C.'
• Ordinal - target variable assumes ordered values since ordinal. Example: ‘Very Good’, ‘Good’, ‘Average, ‘poor’, ‘very poor’.

### Support Vector Machine (linear kernel)

→ It is a straightforward supervised machine learning algorithm used for regression/classification.

→ This model finds a hyper-plane that creates a boundary between the various data types.

→ It can be used for binary Classification as well as multinomial classification problems.

→ A binary classifier can be created for each class to perform multi-class Classification.

→ In the case of SVM, the classifier with the highest score is chosen as the output of the SVM.

→ SVM works very well with linearly separable data but can work for non-linearly separable data as well.

## What is Non-Linear Classification

→ Non-linear classification refers to categorizing those instances that are not linearly separable.

→ Some of the classifiers that use non-linear functions to separate classes are  Quadratic Discriminant Classifier, Multi-Layer Perceptron (MLP), Decision Trees, Random Forest, and K-Nearest Neighbours (KNN).

→ In the figure above, we have two classes, namely 'O' and 'X.' To differentiate between the two classes, it is impossible to draw an arbitrary straight line to ensure that both the classes are on distinct sides.

→ We notice that even if we draw a straight line, there would be points of the first-class present between the data points of the second class.

→ In such cases, piece-wise linear or non-linear classification boundaries are required to distinguish the two classes.

→ This technique is similar to LDA(Linear Discriminant Analysis) discussed above.

→ The only difference is that here, we do not assume that the mean and covariance of all classes are the same.

→ We get the quadratic discriminant function as the following:-

→ Now, let us visualize the decision boundaries of both LDA and QDA on the iris dataset. This would give us a clear picture of the difference between the two.

### Multi-Layer Perceptron (MLP)

→ This is nothing but a collection of fully connected dense layers. These help transform any given input dimension into the desired dimension.

→ It is nothing but simply a neural network.

→ MLP consists of one input layer(one node belonging to each input), one output layer (one node belonging to each output), and a few hidden layers (>= one node belonging to each hidden layer).

→ In the above diagram, we notice three inputs, resulting in 3 nodes belonging to each input.

→ There is one hidden layer consisting of 3 nodes.

→ There is an output layer consisting of 2 nodes, indicating two outputs.

→ Overall, the nodes belonging to the input layer forward their outputs to the nodes present in the hidden layer. Once this is done, the hidden layer processes the information passed on to it and then further passes it on to the output layer.

### Decision Tree

→ It is considered to be one of the most valuable and robust models.

→ Instances are classified by sorting them down from the root to some leaf node.

→  An instance is classified by starting at the tree's root node, testing the attribute specified by this node, then moving down the tree branch corresponding to the attribute's value, as shown in the above figure.

→ The process is repeated based on each derived subset in a recursive partitioning manner.

→ For a better understanding, see the diagram below.

→ The above decision tree helps determine whether the person is fit or not.

→ Similarly, Random Forests, a collection of Decision Trees, is a linear classifier too.

### K-Nearest Neighbours

→ KNN is a supervised machine learning algorithm . It is used for classification problems. Since it is a supervised machine learning algorithm, it uses labeled data to make predictions.

→ KNN analyzes the 'k' nearest data points and then classifies the new data based on the same.

→ In detail, to label a new point, the KNN algorithm analyzes the ‘k’ nearest neighbors or ‘k’ nearest data points to the new point. It chooses the label of the new point as the one to which the majority of the ‘k’ nearest neighbors belong to.

→It is essential to choose an appropriate value of ‘K’ to avoid the overfitting of our model.

→ For better understanding, have a look at the diagram below.

## Linear vs Non-Linear Classification

Now, we will briefly sum up all that we’ve learned and try to compare and contrast Linear Classification and Non-Linear Classification.

### What is the difference between Linear Classification and Non-Linear Classification?

The main difference is that in the case of Linear Classification, data is classified using a hyperplane. In contrast, kernels are used to organize data in the Non-Linear Classification case.

### Name a few linear classifiers.

Some of the popular linear classifiers are:
i) Naive Bayes
ii) Logistic Regression
iii) Support Vector Machine (linear kernel)

### What are the most popular non-linear classifiers?

Some of the popular non-linear classifiers are:
i) Multi-Layer Perceptron (MLP)
ii) Decision Tree
iii) Random Forests
iv) K-Nearest Neighbors

## Conclusion

Congratulations on making it this far. This blog discussed a fundamental overview of Linear Classification and Non-Linear Classification and the differences between the two!!