MLE in statistical models
Conditional MLE for a supervised learning model can be given as:
Maximise{âˆ‘(i to n) log P(xi ; h)}
Where h is the modelling hypothesis which replaces the model parameters. â€˜hâ€™ can be any supervised learning model weâ€™re trying to optimise.
Maximum Likelihood Estimation in Logistic Regression
The objective here would be to predict the best sigmoid curve for the given observation. And for that we need to find the best parameters. For that, weâ€™ll use MLE.
Let the required cost function be given by P(Y;z). Where Y is our sample data and z is the unknown parameter.
Source  link
Here, we have 7 points with respective probabilities their respective probabilities. For points to be 0 we need P1, P2, P4 to be as low as possible and for points to be 1, we need probabilities P3, P5, P6 and P7 to be as high as possible.
This may also be restated as if we need the product
(1P1)*(1P2)* P3*(1P4)*P5*P6*P7
to be maximized. This is called the joint probability. The cost function may be written as
J(z) = Ï€(i to n) P (Yi ; z) (for n samples)
ln J(z) = ln(Ï€(i to n) P (Yi ; z)) (Taking natural logs)
ln J(z) =L(zYi) = âˆ‘(i to n) ln P (Yi ; z))
For a given value of z and the corresponding sample Yi, the function gives the probability of obtaining the observed values. If Yi=1 ,function becomes z. For Yi=0, the function becomes 1z.
ln J(z) =L(zYi)= âˆ‘(i to n) ln (zyi *(1z)1yi ))
Simplifying it further, the final expression comes out to be
The function maximizes at z= âˆ‘(i=1 to n) Yi/n
Implementation
# import the necessary libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from statsmodels import api
from scipy import stats
from scipy.optimize import minimize
# create an independent variable
x = np.linspace(10, 30, 100)
# create a normally distributed residual
e = np.random.normal(10, 5, 100)
# generate ground truth
y = 10 + 4*x + e
df = pd.DataFrame({'x':x, 'y':y})
df.head()
# visualize data distribution
sns.regplot(x='x', y='y', data = df)
plt.show()
BY OLS APPROACH
features = api.add_constant(df.x)
model = api.OLS(y, features).fit()
model.summary()
# find the std dev
res = model.resid
standard_dev = np.std(res)
standard_dev
BY MLE APPROACH
def MLE_Norm(parameters):
const, beta, std_dev = parameters
pred = const + beta*x
LL = np.sum(stats.norm.logpdf(y, pred, std_dev))
neg_LL = 1*LL
return neg_LL
mle_model = minimize(MLE_Norm, np.array([2,2,2]), method='LBFGSB')
mle_model
The parameters obtained via both the approaches are similar.
FAQs

Contrast Maximum likelihood estimation with ordinary least squares in linear regression.
The MLE chooses parameters that can maximize the likelihood or, equivalently the loglikelihood function. It then fits the model based on the trial estimated parameter value and calculate the mean of the model. To find the iterative weighted and working dependence and based on this two and the design matrix we can estimate the best parameter value.
OLS checks and minimizes the residual errors(square of the difference between observed value and the predicted value) of the model.

Provide an expression for maximum likelihood estimation in linear regression.
Without getting too much into the derivation, the final expression can be given as
Maximize {âˆ‘(i to n) log (1 / âˆš(2 *Ï€*sigma^{2})) â€“ (1/(2 *sigma^{2}) * (yi â€“ h(xi, Beta))^{2})}
xi is a given example and beta is the coefficients of the linear regression model.

State the advantages of MLE over other estimators.
Following are the advantages of MLE over other estimators:
â†’ If model assumptions are right, it is the most efficient parameter estimation technique.
â†’ Provides a flexible approach suitable for a variety of applications.
â†’ Works the best for larger samples.
Key Takeaways
Maximum likelihood estimation is a popular and widely used optimisation technique among data scientists. Maximum likelihood estimation chooses parameters in such a way that it maximizes the likelihood of observing the datapoints. Although, most companies might not expect a beginner to be aware of the nittygritty of this technique, an extra bit of knowledge always goes a long way. You may check out our industryoriented machine learning courses curated by industry experts.