Introduction
Laplace smoothing is a smoothing strategy that assists tackle the issue of no likelihood in the Naïve Bayes with machine learning calculation. It is a fun and fascinating algorithm with will help in utilizing higher alpha qualities. The likelihood of a word is equivalent to 0.5 for both the positive and negative surveys.
Naive Bayes and Laplace Smoothing
Two main problems are generally encountered in Laplace smoothing. One of them is the occurrence of zeros when a particular combination of a feature and a class variable value is not present in training data. Now here we will look at a way to handle that which will be a method called Laplace smoothing. We are going to discuss K plus smoothing where K is the count that is added to our count. Below is the formula in which we can see that it is added in the denominator as well as the numerator.
P,k(x) = [c(x) + k]/(N+kX)
Naive Bayes
Naive Bayes is a simple technique for developing classifiers: models that appoint class names to problem cases, represented as vectors of component values, where the class names are drawn from some limited set. There is certainly not a solitary calculation for preparing such classifiers, yet a group of calculations in light of a typical principle: all naive Bayes classifiers expect that the worth of a particular element is independent of the worth of some other element, given the class variable. For example, a natural product might be viewed as an apple assuming it is red, round, and around 10 cm in width. A naive Bayes classifier thinks about every one of these elements to contribute independently to the probability that this organic product is an apple, no matter what any possible connections between the shading, roundness, and distance across highlights.
In numerous practical applications, parameter assessment for naive Bayes models utilizes the technique for greatest probability; at the end of the day, one can work with the naive Bayes model without accepting Bayesian probability or utilizing any Bayesian strategies.
We should accept an example of text arrangement where the undertaking is to order whether the audit Is positive or negative. We fabricate a probability table in view of the preparation information. While questioning a survey, we utilize the Likelihood table qualities, however, imagine a scenario where a word in an audit was not present in the preparation dataset.
Inquiry survey = w1 w2 w3 w’
We have four words in our inquiry survey, and how about we accept just w1, w2, and w3 are present in preparing information. Along these lines, we will have a probability for those words. To compute whether the audit is positive or negative, we compare P(positive|review) and P(negative|review).
P(positive review) ~ P(w₁|positive) * P(w₂|positive) * P(w3|positive) P(w' positive) P(positive)
In the probability table, we have P(w1|positive), P(w2|Positive), P(w3|Positive), and P(positive). Gracious, pause, however, where could P(w'|positive be)?
Consequently, this approach appears to be consistently inaccurate.
Approach 2-In a sack of words model, we count the event of words. The events of word w' in preparing are 0. As per that
P(w'|positive)=0 and P(w'|negative)=0, however, this will make both P(positive|review) and P(negative|review) equivalent to 0 since we multiply every one of the probabilities. Now, this is where Laplace smoothing comes into place to help solve this problem.
Laplace Smoothing
Laplace smoothing is a smoothing technique that handles the problem of no probability in Naïve Bayes. Utilizing Laplace smoothing, we can represent P(w'|positive) as
P(w'|positive) = number of reviews with w' and y = positive + alpha/ (N + *K)
In the equation,
alpha is the smoothing parameter,
K is the quantity of aspects in the information, and
N is the quantity of surveys with y = positive
Lt is assume we pick a worth of alpha!=0, the probability will as of now not be zero regardless of whether a word isn't present in the preparation dataset.
Suppose the event of word w is 3 with y=positive in preparing information. Expecting we have 2 highlights in our dataset, i.e., K=2 and N=100
As alpha builds, the probability moves towards uniform dispersion (0.5). More often than not, alpha = 1 is being utilized to eliminate the problem of zero probability.
Therefore we can say that Laplace smoothing is a smoothing technique that helps tackle the problem of no probability in the Naïve Bayes AI calculation. Utilizing higher alpha qualities will push the probability towards the worth of 0.5, i.e., the probability of a word equivalent to 0.5 for both the positive and negative surveys. Since we are not getting a lot of data from that, it isn't preferable. Subsequently, it is preferred to utilize alpha=1.
Let's take an example to acknowledge this better and what is K and how it is added to the count calculation.
We have flipped a coin 3 times and got 2 heads and 1 tail. According to Laplace smoothing take K=0
PLAP,0(X) = 2/3, 1/3
The above equation is nothing but naive Bayes without Laplace smoothing.
Now well have laplace 1 smoothing of x then we will use equation
P,k(x) = c(x) + k/(N+kX)
Which will result in
PLAP,0(X) = 3/5, 2/5
Now lets look at laplace smoothing of 100, then the equation will be:
PLAP,0(X) = 102/203, 101/203
K is the strength of the prior and can drive this as a MAP estimate for multinominal with Dirichlet priors.
Laplace for conditional can smooth each condition independently i.e.
P LAP k (x|y) = c(x) + k/N+k|X|
Advantages and Disadvantages
The benefit of Laplace Smoothing
It guarantees no instance of zero prior probability and appropriately executes the order.
The disadvantages of Laplace Smoothing
Since the numerical terms are changed to give a superior order, the genuine probabilities of the occasion are changed. Additionally, to expand the worth of the zero probability relevant informative item, the other information point's possibilities are decreased to keep up with the law of probability.
Benefits of Naive Bayes
For problems where information size is minor, it can accomplish better outcomes as it has a low propensity to overfit. Here the preparation is rapid, and it comprises computing the priors and the probability—faster prediction on another information. The RAM footprint is very humble, as these operations don't need the complete data set in the RAM. Unassuming CPU utilization, there are no angles or iterative parameter updates to compute. This effectively handles missing worth elements.
Drawbacks of Naive Bayes
It can't incorporate the element collaborations. For relapse, there may not be a decent method for working out the probability. This element has an assumption that things are independent of one another which is not noticeable, all things considered, applications. There are a few possibilities of loss of exactness. Zero Frequency and that implies on the off chance that any categorical variable isn't present in the informational preparation collection, then, at that point, zero probability is allowed there.
Also See - Difference between argument and parameter