Table of contents
1.
Introduction
2.
What is Bayes' Theorem?
2.1.
Example:
3.
Key Terms Related to Bayes' Theorem
3.1.
Prior Probability
3.2.
Posterior Probability
3.3.
Likelihood
3.4.
Marginal Likelihood
4.
Prerequisites for Bayes Theorem:
5.
How to Apply Bayes Theorem in Machine Learning?
6.
What is Naïve Bayes Classifier in Machine Learning?
7.
Applications of Bayes' Theorem in Machine Learning
7.1.
1. Naive Bayes Classifier
7.2.
Python
7.3.
2. Spam Filtering
7.4.
Python
7.5.
3. Medical Diagnosis
7.6.
Python
8.
Advantages of Naïve Bayes Classifier in Machine Learning
9.
Disadvantages of Naïve Bayes Classifier in Machine Learning
10.
Frequently Asked Questions
10.1.
What is Bayes' theorem used for in machine learning?
10.2.
Why is Naive Bayes important in machine learning?
10.3.
How is Bayesian learning different from other methods?
11.
Conclusion
Last Updated: Sep 15, 2024
Medium

Bayesian Learning in Machine Learning

Author Riya Singh
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Bayesian learning in machine learning is a method based on Bayes' theorem, enabling decisions or predictions through probabilities. This approach allows models to continuously update their knowledge as new data becomes available, making it valuable for tasks like spam detection, medical diagnosis, and more.

Bayesian Learning in Machine LearningBayesian Learning in Machine Learning

In this article, you'll learn about Bayes' theorem, its key terms, and its applications in machine learning.

What is Bayes' Theorem?

Bayes' theorem is a mathematical formula used to calculate the probability of a hypothesis based on prior knowledge. It updates the probability of the hypothesis as new evidence or information is introduced.

The formula is expressed as:

Formula

 

Where:

  • P(H|E):  is the probability of the hypothesis H given the evidence E.
     
  • P(E|H): is the probability of the evidence E given that H is true.
     
  • P(H)Prior probability is the original probability of a hypothesis before considering any new evidence.
     
  • P(E): Marginal likelihood (the total probability of the evidence)

Example:

Suppose you want to find the probability that a student has mastered a specific topic based on their test performance.

  • H: The hypothesis is that the student has mastered the topic.
     
  • E: The evidence is that the student scored well on the test.

 

Given the prior probability that 70% of students typically master the topic, and the likelihood that students who have mastered the topic score well on a test is 90%, we can apply Bayes' theorem to calculate the updated probability after seeing the test results.

Key Terms Related to Bayes' Theorem

Prior Probability

This is the initial probability of an event before considering new evidence. For instance, if we know that 30% of people in a town like sports, this is the prior probability.

Posterior Probability

Posterior probability is the updated probability of an event after incorporating new evidence. Bayes’ theorem helps compute this. In machine learning, this is critical for updating models with fresh data.

Likelihood

Likelihood refers to the probability of observing the evidence given a specific hypothesis. In machine learning, likelihood plays a key role in refining models when additional data is available.

Marginal Likelihood

This is the probability of the evidence across all possible hypotheses. It acts as a normalizing factor in Bayes' theorem.

Prerequisites for Bayes Theorem:

To understand and apply Bayes Theorem effectively, you should know some key probability concepts:

  • Probability Basics: Be familiar with fundamental ideas such as joint, marginal, and conditional probabilities.
     
  • Probability Distributions: Understand how probabilities are distributed over various events.
     
  • Statistical Independence: Know what independent events are, as independence assumptions are often involved in applying Bayes Theorem.

How to Apply Bayes Theorem in Machine Learning?

Bayes Theorem plays a critical role in probabilistic classifiers, particularly when making predictions with prior knowledge and new data. Here's how you can apply it:

  • Formulate the Problem: Define the hypothesis and evidence in the context of your model.
     
  • Compute Prior Probabilities: Calculate the probability of each hypothesis before observing the data.
     
  • Calculate Likelihoods: Find the likelihood of the observed data under each hypothesis.
     
  • Update Beliefs: Use Bayes Theorem to adjust prior probabilities based on the observed data's likelihood.
     
  • Make Predictions: Make predictions based on the posterior probabilities, usually selecting the class with the highest posterior value.

What is Naïve Bayes Classifier in Machine Learning?

A Naïve Bayes classifier is a simple and efficient classification method based on Bayes Theorem, with an assumption of feature independence.

  • Assumption: It assumes that all features are conditionally independent given the class, simplifying the computation.
     
  • Application: It is widely used in text classification, spam detection, and any tasks involving large feature spaces where features are considered independent.

Applications of Bayes' Theorem in Machine Learning

Bayes' theorem is used extensively across various machine learning algorithms and models. Some key applications include:

1. Naive Bayes Classifier

Naive Bayes is a simple yet powerful machine learning algorithm based on Bayes’ theorem. It assumes that the features in a dataset are independent of each other, which is uncommon in real-world situations, but the algorithm often performs well despite this assumption.

Here’s a simple Python implementation of Naive Bayes using scikit-learn:

  • Python

Python

# Importing the necessary libraries

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score


# Load the dataset

data = load_iris()

X, y = data.data, data.target


# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)



# Create the Naive Bayes model

model = GaussianNB()


# Train the model

model.fit(X_train, y_train)


# Make predictions

y_pred = model.predict(X_test)



# Check the accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")
You can also try this code with Online Python Compiler
Run Code

 

Output

Accuracy: 95.56%

 

Explanation: The code splits the Iris dataset into training and testing sets, trains a Naive Bayes model on the training data, and evaluates its accuracy on the testing data. The output shows that the model achieved 95.56% accuracy.

2. Spam Filtering

Bayesian learning is commonly used in spam filters. The filter estimates the likelihood of an email being spam based on the words it contains. This approach is very effective since it continuously updates its knowledge with new emails, improving its accuracy over time.

Here’s a simplified version of how a spam filter might use Bayes' theorem:

  • Python

Python

# Simplified probabilities for a spam filter

P_spam = 0.4  # Prior probability of an email being spam

P_ham = 0.6 # Prior probability of an email being not spam


P_word_given_spam = 0.9  # Likelihood of seeing a specific word in a spam email

P_word_given_ham = 0.1 # Likelihood of seeing the same word in a non-spam email


# Marginal probability

P_word = (P_word_given_spam * P_spam) + (P_word_given_ham * P_ham)


# Posterior probability (Is the email spam, given the word?)

P_spam_given_word = (P_word_given_spam * P_spam) / P_word


print(f"Probability of spam given the word: {P_spam_given_word:.2f}")
You can also try this code with Online Python Compiler
Run Code


Output

Probability of spam given the word: 0.86


Explanation: In this example, we calculate the probability of an email being spam if a certain word is present. The output indicates an 86% probability that the email is spam given the word.

3. Medical Diagnosis

In medical diagnosis, Bayesian networks help estimate the likelihood of a disease based on symptoms. By combining prior knowledge (such as how common the disease is) with patient-specific data (like test results), doctors can make well-informed decisions.

Here’s a simplified calculation using Bayes' theorem:

  • Python

Python

# Simplified probabilities for medical diagnosis

P_disease = 0.01  # Prior probability of having the disease

P_no_disease = 0.99  # Prior probability of not having the disease

P_test_given_disease = 0.95  # Probability of a positive test given the disease

P_test_given_no_disease = 0.05  # Probability of a positive test given no disease


# Marginal probability of a positive test

P_test = (P_test_given_disease * P_disease) + (P_test_given_no_disease * P_no_disease)


# Posterior probability (Probability of disease given a positive test)

P_disease_given_test = (P_test_given_disease * P_disease) / P_test


print(f"Probability of having the disease given a positive test: {P_disease_given_test:.2f}")
You can also try this code with Online Python Compiler
Run Code


Output:

Probability of having the disease given a positive test: 0.16


Explanation: This code calculates the probability of having a disease if the test result is positive. The result shows a 16% probability, which reflects how Bayesian updating works with medical data.

Advantages of Naïve Bayes Classifier in Machine Learning

  • Efficiency: Easy to train and predict, as it treats each feature independently based on the class.
     
  • Scalability: Works well with high-dimensional data since each feature is handled individually.
     
  • Performance: Delivers good performance when the independence assumption is reasonably met.
     
  • Baseline Model: Often used as a baseline for comparing more complex classifiers.

Disadvantages of Naïve Bayes Classifier in Machine Learning

  • Independence Assumption: The strong assumption that features are independent rarely holds in real-world cases, affecting its performance.
     
  • Probability Estimates: May not provide accurate probability estimates, making it more suited for classification than probability prediction.
     
  • Feature Correlation: Has difficulty with correlated features as it treats them as independent.
     
  • Data Scarcity: The classifier may struggle with zero frequency issues if a category lacks certain attributes in the training set, but techniques like Laplace smoothing can mitigate this.

Frequently Asked Questions

What is Bayes' theorem used for in machine learning?

Bayes' theorem is used to calculate probabilities based on prior knowledge, making it valuable for models that need to update their predictions as new data becomes available.

Why is Naive Bayes important in machine learning?

Naive Bayes is important because it offers a fast and efficient solution for classification tasks, especially when a simple but effective model is needed, such as for spam detection.

How is Bayesian learning different from other methods?

Bayesian learning differs from other methods by continuously updating the model's predictions with new data, making it more adaptable and accurate over time.

Conclusion

Bayesian learning is a robust method in machine learning, particularly for decision-making tasks that involve updating probabilities with new data. By applying Bayes' theorem, algorithms such as Naive Bayes classifiers and spam filters can enhance their accuracy by continuously learning from new information.

You can also check out our other blogs on Code360.
 

Live masterclass