Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Every day a human brain processes a million pieces of information and learns various things. But our brain decides which information is helpful and which is not. Similarly, in a neural network, data is stored and classified as valid or not beneficial information with the help of the activation function. The activation functions help the network use vital information and suppress irrelevant data points.
Before we understand activation functions, you must be aware of a neural network; you can review it here.
The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The objective of the activation function is to instigate non-linearity into the output of a neuron. A neural network without an activation function is just a linear regression model. The activation function does the non-linear transformation to the input, making it capable of learning and performing more complex tasks.
The binary Step Function is one of the most uncomplicated and straightforward activation functions. We assume a threshold value, and if the input is greater than the threshold value, the neuron is activated.
One of the limitations of the Binary Step Function is that it cannot be used when there are multiple classes in target variables. Apart from that, the gradient of the binary step function is zero, which causes hindrance in the backpropagation process.
Linear Function
As we know in Binary Step Function, the gradient becomes zero as there is not a single component of x. It can be resolved by applying Linear Function, also known as the identity function. In the Linear process, activation is directly proportional to the input.
Although the gradient here does not become zero, it's impossible to use backpropagation as the function's derivative is constant and has no relation with the input x. Apart from that, all neural network layers would collapse if a linear activation function was used. The linear activation function turns the neural network into just one layer.
Sigmoid Function
The sigmoid function is one of the most important and widely used activation functions because it is non-linear. It transforms the value in the range of 0 to 1. Mathematically it is represented as
f(x) = 1/ (1+ e-x)
The Sigmoid Function is not symmetric, implying that all output neurons' signs would be the same. The issue can be solved by scaling the function.
Tanh is the hyperbolic tangent activation function. It is like the Sigmoid function, but it is symmetric to the origin, resulting in distinct signs of the outputs from previous layers that will be fed as input to the next layer. Mathematically it is represented as,
Tanh is preferred over sigmoid function as it has gradients that are not restricted to vary in a particular direction and also, and it is also zero centered.
ReLU Function
The ReLU function is a non-linear activation function that stands for a rectified linear unit. The benefit of using the ReLU function is that all the neurons are not activated simultaneously. It implies that a neuron will be deactivated only when the output of linear transformation is zero.
The result is zero for the negative input values, which implies all the neurons are not activated simultaneously. That means it is computationally more efficient than the sigmoid and tanh functions.
Softmax Function
The Softmax function is described as a combination of multiple sigmoid functions. It can be used for multiclass classification problems that return the probability for a data point belonging to each class.
The output layer will consistently have the same number of neurons as the number of classes in the target while building a network for a multiclass problem.
The activation function aims to introduce non-linearity into the output of a neuron.
What is the neural network's outcome if the binary step function is used with a threshold value of 1?
The operation produces 1 (or true) when input passes a threshold limit, whereas it delivers 0 (or false) when information does not pass the threshold. The step function is commonly used in primitive neural networks without a hidden layer.
Why does CNN use ReLU?
The usage of ReLU helps prevent the exponential growth in the computation required to operate the neural network. If the CNN scales in size, the computational cost of adding extra ReLUs increases linearly.
Why is exponential used in softmax?
The goal of the softmax function is to take a vector of arbitrary real numbers and generate a probability distribution with the same number of elements such that more prominent features get higher probabilities and more minor details get smaller chances. The exponential function monotonically increases and gives non-negative results for any real input. Therefore, it is a natural candidate.
Conclusion
In this article, we have extensively discussed various activation functions applied in neural networks. We hope that this blog has helped you enhance your knowledge regarding activation functions and if you would like to learn more, check out our articles here. Do upvote our blog to help other ninjas grow. Happy Coding!