Table of contents
1.
Introduction
2.
Bayesian Decision Theory
2.1.
Prior probability 
2.2.
Likelihood Probability 
3.
Bayesian Estimation
4.
Applications 
5.
Implementation
6.
FAQs
7.
Key Takeaways
Last Updated: Mar 27, 2024

Bayesian Estimation

Author Arun Nawani
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

We’ve covered maximum likelihood estimation in our previous blog. You may follow the link in case you missed it or need a refresher on it. Simply put, it’s a parameter optimisation technique that makes use of probability density functions instead of numerical values. We highly recommend having prior knowledge of Maximum Likelihood Estimation since that would make this a little more intuitive.

Bayesian Decision Theory

Bayesian Decision theory is a statistical approach to pattern classification. It’s a probabilistic approach to make classifications and measures risks associated with assigning an input to a given class. 

It highlights how prior probability by itself isn’t the most efficient way to make predictions. Bayesian decision theory takes into consideration prior probability, likelihood probability, and evidence to compute the posterior probability. 

Prior probability 

It is the initial probability of an event before we take into account any new piece of information. For example, we are asked which of the 2 teams, A and B, will win the next match. In the last 5 appearances between the 2, A has won 2 times and B has won 3 times. 

So the prior probability of A winning the next match is ⅖. But this may not hold true as there could be various other factors like injured players in team A. So predicting the winner solely on the basis of prior probability may not be the most efficient way to do so.

Likelihood Probability 

Likelihood probability is computed for an event, given some conditions. It is denoted by-

P(B|Ak). 

Here B is the condition while Ak is the outcome. There may be multiple outcomes. 

Now, suppose team B has many injured players while team A has their entire squad. This heavily puts odds in favour of A for the next match. While the prior probability said team B was more likely to win the match.

Bayesian Decision theory takes into consideration past results as well as the current situation to make the predictions. It is given by

Source - link

P(B) is number of times condition X has occurred. It is referred to as Evidence. 

P(Ak|B) is the posterior probability. It is the probability of outcome A given some condition B. 

Bayesian Estimation

Source - link

In Bayesian Estimation, the equation just takes probability distributions instead of numeric values unlike what we learned in Bayesian Decision Theory.

Notice we replaced evidence with the integral of the numerator. This is because P(D) is tough to calculate and it doesn’t depend on P(θ). It also ensures the integral of the posterior distribution to be 1. 

Here ∫P(D|θ)P(θ)dθ  is known as evidence. 

In Bayesian Estimation, we compute a distribution over a parameter space known as posterior pdf( P(θ|D) ). 

Source  - link

We see that Bayesian estimation tries to encompass both, the prior probability and the likelihood function to give out the result of the posterior distribution. 

Applications 

  • Bayesian Estimation is used to monitor cracks in gas piping systems that pose serious hazards if not handled with utmost care. The most probable parameters are estimated corresponding to cracks using prior knowledge of standard leaks. 
  • Bayesian estimation is used in daily clinical practice to estimate the real-time condition of a patient using prior knowledge to give the most suitable treatment. 

Implementation

%matplotlib inline
import numpy as np
import pandas as pd
import statsmodels.api as sm
import sympy as sp
import pymc
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from mpl_toolkits.mplot3d import Axes3D
from scipy import stats
from scipy.special import gamma

from sympy.interactive import printing
printing.init_printing()
You can also try this code with Online Python Compiler
Run Code

 

# Simulate data
np.random.seed(123)

nobs = 100
theta = 0.3
Y = np.random.binomial(1, theta, nobs)
You can also try this code with Online Python Compiler
Run Code

 

# Plot the data
fig = plt.figure(figsize=(7,3))
gs = gridspec.GridSpec(1, 2, width_ratios=[5, 1]) 
ax1 = fig.add_subplot(gs[0])
ax2 = fig.add_subplot(gs[1])

ax1.plot(range(nobs), Y, 'x')
ax2.hist(-Y, bins=2)

ax1.yaxis.set(ticks=(0,1), ticklabels=('Failure', 'Success'))
ax2.xaxis.set(ticks=(-1,0), ticklabels=('Success', 'Failure'))

ax1.set(title=r'Bernoulli Trial Outcomes $(\theta=0.3)$', xlabel='Trial', ylim=(-0.2, 1.2))
ax2.set(ylabel='Frequency')

fig.tight_layout()
You can also try this code with Online Python Compiler
Run Code

The likelihood function

t, T, s = sp.symbols('theta, T, s')

# Create the function symbolically
likelihood = (t**s)*(1-t)**(T-s)

# Convert it to a Numpy-callable function
_likelihood = sp.lambdify((t,T,s), likelihood, modules='numpy')
You can also try this code with Online Python Compiler
Run Code

Prior

# For alpha_1 = alpha_2 = 1, the Beta distribution
# degenerates to a uniform distribution
a1 = 1
a2 = 1

# Prior Mean
prior_mean = a1 / (a1 + a2)
print('Prior mean:', prior_mean)

# Plot the prior
fig = plt.figure(figsize=(10,4))
ax = fig.add_subplot(111)
X = np.linspace(0,1, 1000)
ax.plot(X, stats.beta(a1, a2).pdf(X), 'g')

# Cleanup
ax.set(title='Prior Distribution', ylim=(0,12))
ax.legend(['Prior'])
You can also try this code with Online Python Compiler
Run Code

Posterior

# Find the hyperparameters of the posterior
a1_hat = a1 + Y.sum()
a2_hat = a2 + nobs - Y.sum()

# Posterior Mean
post_mean = a1_hat / (a1_hat + a2_hat)
print('Posterior Mean (Analytic):', post_mean)

# Plot the analytic posterior
fig = plt.figure(figsize=(10,4))
ax = fig.add_subplot(111)
X = np.linspace(0,1, 1000)
ax.plot(X, stats.beta(a1_hat, a2_hat).pdf(X), 'r')

# Plot the prior
ax.plot(X, stats.beta(a1, a2).pdf(X), 'g')

# Cleanup
ax.set(title='Posterior Distribution (Analytic)', ylim=(0,12))
ax.legend(['Posterior (Analytic)', 'Prior'])
You can also try this code with Online Python Compiler
Run Code

FAQs

  1. When do we prefer using bayesian estimation over MLE? 
    Computing Bayesian Estimation is a little more complex than MLE as it requires knowledge of prior, likelihood function, and evidence. 
     
  2. What are the advantages of Bayesian Estimation? 
    Bayesian Estimation is suited for huge data. And the variance is almost negligible. 
     
  3. What are the disadvantages of Bayesian Estimation? 
    Bayesian Estimation requires knowledge of a lot of other factors like prior, likelihood, and evidence. A good estimation is achieved only when we have an accurate knowledge of all the factors involved, which can be tough to gather. 

Key Takeaways

Bayesian Estimation is another popular parameter estimation technique besides maximum likelihood estimation. This blog gives a beginner’s insight into the technique while contrasting it with Maximum likelihood Estimation at some points. It’s recommended for readers to go through the blog atleast a few couple times to get a better understanding of the little details that you might miss out on initially. You may check out our industry-oriented machine learning courses curated by industry experts.

Live masterclass