Table of contents
1.
Introduction
2.
Why do we need activation functions?
3.
Types of Activation Functions
3.1.
Binary Step function
3.2.
Linear Function
3.3.
Sigmoid Function
3.4.
Tanh Function
3.5.
ReLU Function
3.6.
Softmax Function
4.
Frequently Asked Questions
4.1.
Why does CNN use the activation function?
4.2.
What is the neural network's outcome if the binary step function is used with a threshold value of 1?
4.3.
Why does CNN use ReLU?
4.4.
Why is exponential used in softmax?
5.
Conclusion
Last Updated: Mar 27, 2024
Easy

Neural Network Activation Functions

Author Tashmit
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Every day a human brain processes a million pieces of information and learns various things. But our brain decides which information is helpful and which is not. Similarly, in a neural network, data is stored and classified as valid or not beneficial information with the help of the activation function. The activation functions help the network use vital information and suppress irrelevant data points.

Before we understand activation functions, you must be aware of a neural network; you can review it here.

Source: Link

Also Read, Resnet 50 Architecture

Why do we need activation functions?

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The objective of the activation function is to instigate non-linearity into the output of a neuron. A neural network without an activation function is just a linear regression model. The activation function does the non-linear transformation to the input, making it capable of learning and performing more complex tasks.

Source: Link

Types of Activation Functions

Binary Step function

The binary Step Function is one of the most uncomplicated and straightforward activation functions. We assume a threshold value, and if the input is greater than the threshold value, the neuron is activated. 

Mathematically it is represented as,

f(x) = 1, if x >= 0

f(x) = 0, if x < 0
 

Source: Link

One of the limitations of the Binary Step Function is that it cannot be used when there are multiple classes in target variables. Apart from that, the gradient of the binary step function is zero, which causes hindrance in the backpropagation process.  

Linear Function

As we know in Binary Step Function, the gradient becomes zero as there is not a single component of x. It can be resolved by applying Linear Function, also known as the identity function. In the Linear process, activation is directly proportional to the input.

Source: Link

Although the gradient here does not become zero, it's impossible to use backpropagation as the function's derivative is constant and has no relation with the input x. Apart from that, all neural network layers would collapse if a linear activation function was used. The linear activation function turns the neural network into just one layer.

Sigmoid Function

The sigmoid function is one of the most important and widely used activation functions because it is non-linear. It transforms the value in the range of 0 to 1. Mathematically it is represented as 

f(x) = 1/ (1+ e-x)

The Sigmoid Function is not symmetric, implying that all output neurons' signs would be the same. The issue can be solved by scaling the function.

Source: Link

Tanh Function

Tanh is the hyperbolic tangent activation function. It is like the Sigmoid function, but it is symmetric to the origin, resulting in distinct signs of the outputs from previous layers that will be fed as input to the next layer. Mathematically it is represented as,

tanh(x) = 2 sigmoid(2x) - 1 

Source: Link

Tanh is preferred over sigmoid function as it has gradients that are not restricted to vary in a particular direction and also, and it is also zero centered.

ReLU Function

The ReLU function is a non-linear activation function that stands for a rectified linear unit. The benefit of using the ReLU function is that all the neurons are not activated simultaneously. It implies that a neuron will be deactivated only when the output of linear transformation is zero.

Mathematically it is represented as,

f(x) = max(x,0)

Source: Link

The result is zero for the negative input values, which implies all the neurons are not activated simultaneously. That means it is computationally more efficient than the sigmoid and tanh functions.

Softmax Function

The Softmax function is described as a combination of multiple sigmoid functions. It can be used for multiclass classification problems that return the probability for a data point belonging to each class. 

Mathematically it can be represented as,

Source: Link

 

The output layer will consistently have the same number of neurons as the number of classes in the target while building a network for a multiclass problem.

Check out this article - Padding In Convolutional Neural Network

Frequently Asked Questions

Why does CNN use the activation function?

The activation function aims to introduce non-linearity into the output of a neuron.

What is the neural network's outcome if the binary step function is used with a threshold value of 1?

The operation produces 1 (or true) when input passes a threshold limit, whereas it delivers 0 (or false) when information does not pass the threshold. The step function is commonly used in primitive neural networks without a hidden layer.

Why does CNN use ReLU?

The usage of ReLU helps prevent the exponential growth in the computation required to operate the neural network. If the CNN scales in size, the computational cost of adding extra ReLUs increases linearly.

Why is exponential used in softmax?

The goal of the softmax function is to take a vector of arbitrary real numbers and generate a probability distribution with the same number of elements such that more prominent features get higher probabilities and more minor details get smaller chances. The exponential function monotonically increases and gives non-negative results for any real input. Therefore, it is a natural candidate.

Conclusion

In this article, we have extensively discussed various activation functions applied in neural networks. We hope that this blog has helped you enhance your knowledge regarding activation functions and if you would like to learn more, check out our articles hereDo upvote our blog to help other ninjas grow. Happy Coding!

Live masterclass