What are the limitations of perceptron?

Perceptron networks have several limitations. The first one is that the output values are binary. Second, perceptrons can only classify linearly separable sets of vectors.

Can MLP be used for regression?

The Multi-Layer Perceptron algorithms support both regression and classification problems. It is also called artificial neural networks.

What’s the use of bias in MLP?

The bias can be thought of as how flexible the perceptron is. It is similar to the constant b of a linear function y = ax + b. It allows us to move the line-up and down to better fit the prediction with the data.

Table of contents

Introduction

Why MLP??

Algorithm

Implementation using Sklearn

Frequently Asked Questions

Key Takeaways

Last Updated: Mar 27, 2024

MLP(Multi-Layer Perceptron)

Author Mayank Goyal

Do you think IIT Guwahati certified course can help you in your career?

Yes

Introduction

MLP is an integral part of deep learning. MLP is a neural network where the connection between the input and the output is non-linear. MLP contains

One input layer.
One or more hidden layers.
One final layer is called the output layer.

The layers close to input are called lower layers, and the ones close to output are known as upper layers. Every layer except the output layer is fully connected to the next layer and does not contain a bias neuron.

MLP has a minimum of 3 layers, including one hidden layer. If MLP contains more than one hidden layer, it is called a deep ANN. The Multi-Layer Perceptron is an example of a feedforward artificial neural network.

The number of different layers and the number of neurons in different layers are the neural network's hyperparameters, and these parameters need tuning. Cross-validation is a technique to find the optimal values for these hyperparameters.

The weight adjustment training is done through backpropagation. The deeper neural networks, the better they are at processing data. However, deeper layers can lead to vanishing gradient problems.

Why MLP??

A Multi-Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and one output layer). While a single layer perceptron can only learn linear functions, a multi-layer perceptron can also learn non – linear functions. The fact that perceptrons could not represent non-linear functions was proved when it could not describe the exclusive OR gate, where perceptrons only return one of the different inputs.

Algorithm

Initially, we divide the input dataset into mini-batches. It handles one mini-batch at a time, and then it traverses the whole dataset multiple times. Each pass is called an epoch.

def fit( x, y, n_features=2, n_neurons=3, n_output=1, iter=10, eta=0.001):
Args:
x (ndarray): matrix of features
y (ndarray): vector of expected values
n_features (int): number of feature vectors
n_neurons : number of neurons in hidden layer
n_output : number of output neurons
iter (int): number of iterations over the training set
eta (float): learning rate

Returns:
errors (list): list of errors over iterations
param (dic): dictionary of learned parameters
  """

## Initialize parameters
param = init_params(n_features=n_features,
n_neurons=n_neurons,
n_output=n_output)


for _ in range(iterations):
z1 = linear(param['W1'], x, param['b1'])
s1 = sigmoid(Z1)
z2 = linear(param['W2'], S1, param['b2'])
s2 = sigmoid_function(Z2)

def sigmoid(Z):
Args:
Z (ndarray): weighted sum of features

Returns:
S (ndarray): neuron activation
return 1/(1+np.exp(-Z))

def linear(W, X, b):
Args:
W (ndarray): weight matrix
X (ndarray): matrix of features
b (ndarray): vector of biases

Returns:
Z (ndarray): weighted sum of features

return np.dot(X,W)+b

Next, we compute the network’s output error.

## storage errors after each iteration
errors = []
error = cost_function(S2, y)
errors.append(error)

def cost_function(V, y):
Argumentss:
A (ndarray): neuron activation
y (ndarray): vector of expected values

Returns:
Error (float): total squared error

return (np.mean(np.power(V - y,2)))/2

Then, we calculate how much each output connection contributed to the error. Then we calculate how much of these error contributions come from each link from previous layers, and we use the chain rule until we reach the input layer. This reverse pass measures the error gradient across all connection weights in the network by propagating the error gradient backward through the network, known as backward propagation. Finally, we update all the weights and the biases in the network using the error gradient just computed.

W2_gradients = np.dot(S1.T ,d2)
para["w2"] = para["w2"] - W2_gradients * eta

  # update output bias
para["b2"] = para["b2"] - np.sum(delta2, axis=0,
keepdims=True) * eta

  # update hidden weights
d1 = (d2 @ param["W2"].T )* S1*(1-S1)
W1_gradients = np.dot(X.T , d1)
para["W1"] = para["W1"] - W1_gradients * eta

  # update hidden bias
para["b1"] = param["b1"] - np.sum(delta1, axis=0,
keepdims=True) * eta

That's the algorithm we follow while implementing MLP. As stated above, I used the XOR problem as an example as the XOR problem was the reason for discovering the MLP algorithm.

As it is too much to grasp, let us summarize the algorithm. First, we feed each training instance into the network. We perform forward propagation, measure the error, then go through each layer to compute the error contribution from each neuron(backward propagation). Finally, we update the weights and biases to reduce the error.

NOTE:

We should always initialize the weights of the connections randomly, or else training will fail. For example, if we initialize the weights and biases, all the neurons in a given layer will be similar. Thus, the propagation will identically affect them; hence no distinctive changes will be observed, which won't be too clever. So if we initialize the weights randomly, we break the symmetry and allow backpropagation to produce unprecedented changes.

Let us look at the whole code at once:

import numpy as np
import pandas as pd
def init_params(n_features, n_neurons, n_output)

  np.random.seed(100) # for reproducibility
  W1 = np.random.uniform(size=(n_features,n_neurons))
  b1 = np.random.uniform(size=(1,n_neurons))
  W2 = np.random.uniform(size=(n_neurons,n_output))
  b2 = np.random.uniform(size=(1,n_output))


parameters = {"W1": W1,
  "b1": b1,
  "W2": W2,
  "b2": b2}

  return parameters

def sigmoid_function(Z):
  return 1/(1+np.exp(-Z))

def linear_function(W, X, b):
  return np.dot(X,W)+b

def cost_function(V, y):
  return (np.mean(np.power(V - y,2)))/2

def predict(X, W1, W2, b1, b2):


Z1 = linear_function(W1, X, b1)
S1 = sigmoid_function(Z1)
Z2 = linear_function(W2, S1, b2)
S2 = sigmoid_function(Z2)
  return np.where(S2 >= 0.5, 1, 0)

def fit_model(X, y, n_features=2, n_neurons=3, n_output=1, iterations=10, eta=0.001):

para = init_params(n_features=n_features,
n_neurons=n_neurons,
n_output=n_output)


  for _ in range(iterations):
Z1 = linear_function(para['W1'], X, para['b1'])
S1 = sigmoid_function(Z1)
Z2 = linear_function(para['W2'], S1, para['b2'])
S2 = sigmoid_function(Z2)
  ##storage errors after each iteration
errors = []
error = cost_function(S2, y)
errors.append(error)

  ##Backpropagation

  # update output weights
d2 = (S2 - y)* S2*(1-S2)
W2_gradients = np.dot(S1.T,d2)
para["W2"] = para["W2"] - W2_gradients * eta

  # update output bias
para["b2"] = para["b2"] - np.sum(d2, axis=0,
keepdims=True) * eta

  # update hidden weights
d1 = np.dot(d2, para["W2"].T )* S1*(1-S1)
W1_gradients = X.T @ d1
para["W1"] = para["W1"] - W1_gradients * eta

  # update hidden bias
para["b1"] = para["b1"] - np.sum(d1, axis=0,
keepdims=True) * eta

  # expected values
y = np.array([[0, 1, 1, 0]]).T

  # features
X = np.array([[0, 0, 1, 1],
[0, 1, 0, 1]]).T

errors, para = fit(X, y, iterations=5000, eta=0.1)
y_pred = predict(X, para["W1"], para["W2"], para["b1"],
para["b2"])

correct_predictions = (y_pred == y).sum()
accuracy = (correct_predictions / y.shape[0]) * 100
print('Multi-layer perceptron accuracy: %.2f%%' % accuracy)

Output:

Multi-layer perceptron accuracy: 100.00%

Implementation using Sklearn

Importing libraries

import numpy as np
from sklearn import metrics
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

Initializing Data

# expected values
y = np.array([[0, 1, 1, 0]]).T

# features
x = np.array([[0, 0, 1, 1],
[0, 1, 0, 1]]).T

Splitting the Data

train_features, test_features, train_targets, test_targets= train_test_split(x,y,test_size=0.1,random_state=123)

Training the Model:

def MLPerceptron(train_features, test_features, train_targets, test_targets, num_neurons=50):
classifier=MLPClassifier(hidden_layer_sizes=num_neurons, max_iter=1000, activation='relu', solver='sgd',verbose=20, random_state=124, learning_rate='invscaling')
classifier.fit(train_features, train_targets)

predictions=classifier.predict(test_features)
score=np.round(metrics.accuracy_score(test_targets, predictions),2)
print("Mean accuracy:" +str(score))

Model:

MLPerceptron(train_features, test_features, train_targets, test_targets)

Output:

That’s all from the implementation part, and we got 100% accuracy. You can vary different parameters to view the changes.

Frequently Asked Questions

What are the limitations of perceptron?
Perceptron networks have several limitations. The first one is that the output values are binary. Second, perceptrons can only classify linearly separable sets of vectors.
Can MLP be used for regression?
The Multi-Layer Perceptron algorithms support both regression and classification problems. It is also called artificial neural networks.
What’s the use of bias in MLP?
The bias can be thought of as how flexible the perceptron is. It is similar to the constant b of a linear function y = ax + b. It allows us to move the line-up and down to better fit the prediction with the data.

Key Takeaways

Let us brief the article.

Firstly we saw the introductory part of MLP, and then we learned why we need MLP and lastly, we saw the implementation part of MLP in two different ways, first using simple NumPy and pandas, and secondly, using sklearn.

Well, that’s one of the fundamental algorithms of neural networks. However, with Multilayer Perceptron, horizons are expanded, and now this neural network can have many layers of neurons and is ready to learn more complex patterns.

Check out this article - Padding In Convolutional Neural Network

That’s the end of the article. I hope you Like it.

Keep Learning NInjas!

Live masterclass

Zero to GenAI Developer: Amazon SDE Roadmap for 30L+ CTC

by Sumit Shukla

07 Jun, 2026

08:30 AM

32+ registered

Zero to Data Analyst: Google Analyst Roadmap for 30L+ CTC

by Prashant

08 Jun, 2026

01:30 PM

3+ registered

Google SDE Roadmap to land 30L+ CTC

by Saurav Prateek

08 Jun, 2026

03:00 PM

6+ registered

PowerBI + AI for Data Analytics: Secure 30L+ CTC at Netflix

by Ashwin Goyal

07 Jun, 2026

06:30 AM

14+ registered

Zero to GenAI Developer: Amazon SDE Roadmap for 30L+ CTC

by Sumit Shukla

07 Jun, 2026

08:30 AM

32+ registered

Zero to Data Analyst: Google Analyst Roadmap for 30L+ CTC

by Prashant

08 Jun, 2026

01:30 PM

3+ registered

View more events