Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Adjusting the weights
Activation Function
Sigmoid Function
Tanh function
ReLU (Rectified Linear Unit) Function
Frequently Asked Questions
Key takeaways
Last Updated: Mar 27, 2024


Author Arun Nawani
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


Artificial Intelligence has made the buzz since the early 2000s and it has never seemed to dull out. Deep Learning, one of the subsidiaries of Artificial Intelligence, has particularly fascinated most people and is what people usually picture whenever they think of AI. Image-related learning tasks like face recognition are commonplace in today’s world. It counts as one of the basic functionalities in a smartphone or a laptop these days. This is done by artificial simulating how a human brain functions. This is known as Artificial Neural Network and it’s what lies underneath these intelligent AI programs. An ANN can consist of several hundreds of neural layers or even more. However, the most simplistic form of ANN is known as the Perceptron.

(Sounds too fancy for something so basic, isn’t it?)


This blog assumes the readers to be well versed with Artificial Neural Networks, their architecture, and the important terminologies associated with them. You can check out our previous blogs on ANN if you want to brush up on your concepts. 


As discussed, it’s the most simplistic form of an Artificial Neural Network. But what is meant by that? An ANN consists of three components- 

  • The input layer: It takes the input and preprocesses to feed it to the next subsequent layer. 
  • The hidden layer: It receives the input and this is where the majority of computation is done. The adjusting of weights is done by multiple iterations of forward and backward propagation. The hidden layer can contain several multiple layers. The results of this component are passed on to the next component of the architecture. 
  • The output layer: The final component of architecture. It receives the data from the hidden layers and then gives the output. It also sends feedback after every iteration of forward propagation. This is what initiates backpropagation. 


If you want to learn more about forward propagation and backpropagation, check out our previous articles by clicking on either of the two. 

This is an overview of what an ANN is like. Perceptron, being the simpler form of ANN, consists of a single neuron being fed with the input and returning the output. Perceptron is used for binary classification and hence is also referred to as a binary classifier. Take a look at the illustration below:

Source- link

Here, we have x as inputs and w as associated weights. 

For every input xi, we have an associated weight with that input wi. Look at the given equation.

Weighted sum = x1w1 + x2w+ x3w+ …. + xnwn + B (bias term)

We see that the weighted sum expression has a term ‘B’. It is known as the bias term. It can also be adjusted as per the feedback received, just like weights. The weighted sum expression is passed through the activation function. Let’s say the activation function employed is the Sigmoid function. Since the range of sigmoid function is (0,1), the value of the output generated by the neuron would be between 0 and 1. The final function is known as the step function. In this case, the perceptron works as a binary classifier. Therefore, it is a two-valued function. 


Step function = 1 , ysigmoid >=0.5

                         0 , otherwise

Here, 0.5 is our threshold value. x and w values can also be represented in matrix form. 

Source - link

Weighted sum = X.W + B, where X and N are matrices and B is the bias term. 

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Adjusting the weights

This is a supervised learning model. That means we have actual output to verify the generated outputs to adjust the weights such that the error between the predicted value and the actual value is minimum. 

It is given by:

wnew = wold + ∂*(y-y’) 

Where y is the actual value

y’ is the predicted value

∂ is the learning rate. 

Learning rate(∂) needs to be an optimum value so that the function can converge at the global minima. A high value of ∂ could mean that the function might overshoot and miss the global minima. A small value of ∂ could mean that the function would be extremely slow. 

Activation Function

The activation function decides if a neuron needs to be activated or not by computing the weighted sum. The role of the activation function is to add non-linearity into the output of the neuron. 

Source - link

Let’s say we compute Z1 and don’t pass it through the activation function f. Instead, we directly send the output to the next subsequent layer. 

It still results in a linear function since the composition of a linear function is also a linear function. Check out the A2 equation on the right. We have 2 vectors W& W1, which would multiply to form another similar vector W. This way, it doesn’t matter how deep we go into the neural network. It would still result in a linear function. And most of the complex real-world problems can’t be handled with linear functions. They’re better suited for nonlinear functions.

Now that we know what purpose does Activation function serve, let’s see some of the popular Activation Functions

Sigmoid Function

The most commonly used activation function is the Sigmoid function. It is usually used at the output layer since its value lies between 0 and 1. Making it perfect for classification given some threshold values. 

f(x) = 1/(1-e-x

Range = (0,1)

Source - link

Tanh function

It is similar to sigmoid function, but has a range between (-1,1). Tanh is much more suited to be used in the hidden layer since the data is normalized around 0. And since we’re passing the normalised data to the next subsequent layer, it makes the training easier. Also, since we need to take the derivative of the cost function during the backpropagation, we also need to take the derivative of the activation function. And the maximum derivative value of tanh function is much higher than the sigmoid function which makes the training of the model much faster. 

f(x) = (e- e-x) / (ex + e-x)

Range = (-1,1)

Source - link


ReLU (Rectified Linear Unit) Function

ReLU is a piece-wise linear function. Hence it can make use of both, linear and non-linear properties of a function. Its range is in between (-1,1). 

f(x) = max(0,x)

Range = (0, infinity)

Source - link

There are other activation functions as well but these are the most popular ones. If you’re curious, you may go ahead and explore some of the other activation functions like Softmax as well. 

Frequently Asked Questions

Q1. What do you understand by perceptron? 

Ans. A perceptron is the simplest form of an ANN, containing of input, one neuron, and an output. Perceptrons are generally used for binary classification.


Q2. Why do we need the Activation function? 

Ans. Activation functions add non-linearity to the function, making the model suitable for real-world use cases where the problems may or may not be tackled just be linear functions. 


Q3. How is tanh different from sigmoid function? 

Ans. Sigmoid function has a range between (0,1) while tanh has a range between (-1,1). The sigmoid function is more suitable to use at the output layer, while tanh is better used in the hidden layer. 

Key takeaways

This blog thoroughly goes through Perceptron, its architecture as well as its use cases. Perceptrons make the basics of Deep Learning and hence are essential for students aspiring to specialise in the domain. You are expected to be well versed with Perceptrons and their working. You may want to check out our expert-curated courses in deep learning.

Check this out to know about Machine Learning.

Happy Learning

Live masterclass