Maximum Likelihood Estimation
In MLE, the objective is to maximize the likelihood of observing data given specific probability distribution and its parameters. We estimate parameters that maximize the likelihood of observing the data.
Likelihood function
The objective is to maximise the probability of observing the data points from joint probability distribution considering specific probability distribution. This is formally stated as
P(X  theta)
Here, theta is an unknown parameter. This may also be written as
P(X ; theta)
P(x1,x2,x3,...,xn ; theta)
This is the likelihood function and is commonly denoted with L
L(X ; theta)
Since the aim is to find the parameters that maximise the likelihood function
Maximum{L(X;theta)}
The joint probability is restated as a product of conditional probability for every observation given the distribution parameters.
L(X  theta) = π(i to n) P (xi  theta)
Log of likelihood
It’s going to be a lot of work taking the product of all these conditional probabilities. So, to make it slightly easier, we can take log(natural log) on both sides
ln L(X  theta) = ln(π(i to n) P (xi  theta))
Which becomes
ln L(X  theta) = ∑(i to n) log P(xi  theta)
MLE is an optimisation technique which can be used on various machine learning models like Logistic regression, linear regression, etc.
Bayesian Estimation
Bayes Theorem
Most of you might already be aware of bayes theorem. It was proposed by Thomas Bayes. The theorem puts forth a formula for conditional probability. Given as
P(AB)=P(BA).P(A)
P(B)
Here, We find the probability of event A given B is true. And P(A) and P(B) are independent probabilities of events A and B.
Or, you may come across websites referring to these in pure statistical terminology.
P(A) = Prior Probability. This is the probability of any event before we take into consideration any new piece of information.
P(B) is referred to as evidence. How likely an observation of B is given our prior beliefs about A.
P(BA) is referred to as likelihood function. It tells how likely each observation of B is for a fixed A.
P(AB) = Posterior Probability. This is the probability of an event after some event has already occurred.
Bayesian Estimation
Source  link
In Bayesian Estimation, the equation just takes probability distributions instead of numeric values.
Notice we replaced evidence with the integral of the numerator. This is because P(D) is tough to calculate and it doesn’t depend on P(θ). It also ensures the integral of the posterior distribution to be 1.
Here ∫P(Dθ)P(θ)dθ is known as evidence.
In Bayesian Estimation, we compute a distribution over a parameter space known as posterior pdf( P(θD) ).
Source  link
We see that Bayesian estimation tries to encompass both, the prior probability and the likelihood function to give out the result of posterior distribution.
Key Differences between MLE and Bayesian Estimation
While both, Maximum Likelihood Estimation and Bayesian Estimation , are parameter estimation techniques based on probability distribution, There are some key differences between the two.
Check this out, Difference Between Compiler and Interpreter and Assembler
Frequently Asked Questions

Given suitable conditions for both the techniques, which one should be preferred?
A general consensus is that Bayesian Estimation provides more accurate results than MLE. But it is also more complex to compute than MLE.

How are Maximum likelihood estimation and bayesian estimation different from other parameter optimisation techniques?
Maximum likelihood estimation and Bayesian estimation are dependent on likelihood function. To decide on parameters that would give the best fitting model. Something other techniques like OLS don’t.

When do Maximum Likelihood Estimation and Bayesian Estimation predict similar values?
There are a few conditions where Bayesian estimation is extremely close to MLE. When the bayesian prior is uniform over all the values, then bayesian predictions are very close to MLE. Also, if Bayesian prior is well defined and nonzero at all observations and then the Bayesian estimation and MLE will converge at the same value given we have plenty of observations.
Conclusion
This blog briefly explains and contrasts the 2 most widely used parameter estimation techniques, Maximum Likelihood Estimation and Bayesian Estimation. Optimal conditions for both the techniques and their key differences. We advise readers to go through the blog thoroughly. You may check out our industryoriented machine learning courses curated by industry experts.
Happy Learning!