Activation Functions in Neural Networks: A Complete Guide

In deep learning, activation functions are essential to controlling how well the model will learn from the training dataset and determining the type of predictions the model will make.

Activation functions help the neural network model select useful information from noise. Therefore, the choice of an activation function should be made carefully.

In this article, we will define what is an activation function and how to choose the appropriate one for our model based on our problem type.

Let’s start.

Table of Contents

Activation functions definition
Neural networks without activation functions
Activation functions types
- activation functions for hidden layers
- activation functions for the output layer
When to use a specific activation function

Activation Function Definition

What does an activation function mean?

In a neural network, an activation function or “transfer function” transform the weighted sum of the input into the output (see the figure above).

Typically, we need different activation functions in different parts of the model, for the hidden and the output layer.

The same activation function is generally used by all hidden layers.

The output layer will often have a different activation function than the hidden layers and will be determined by the sort of prediction required by the model.

Why do we need an activation function in a neural network?

A neural network model without an activation function is simply a linear regression model that can not perform other complex problems like classification.

When the activation function of the output layer is replaced with a sigmoid function, the neural network performs logistic regression.

When the activation function of the output layer is replaced with a softmax function and a few output units are added, the neural network performs multiclass logistic regression.

When the cost function is replaced with the hinge loss, the neural network is an SVM optimized in its most basic form[1]

The next part will go over the many forms of Activation Functions, their mathematical formulas, graphical representations, and codes.

Activation Function Types

Let’s look at the activation functions for each type of layer one by one.

Activation functions for hidden layers

As the figure shows, the hidden layers in neural networks receive an input and provide an output (from the input layer or a hidden layer to the output layer or another hidden layer).

To learn complicated mathematical functions, the model needs a nonlinearity function or an activation function.

The most used activations functions in hidden layers are:

Rectified Linear Activation function(ReLU)
Logistic function(Sigmoid)
Hyperbolic Tangent function(Tanh)

Activation functions for the output layer

Choosing the activation function for the output layer depends on what prediction problem we are working on: regression, binary, or multi-classification.

In this case, we are considering the following activation functions.

Linear
Logistic (Sigmoid)
Softmax

Let’s take a look at those different activation functions in detail and when to use each one.

Activation functions types

Rectified Linear Activation (ReLU)

Because the Relu function is easy to implement and susceptible to vanishing gradients, it is used wildly as an activation function on hidden layers.

How Relu is calculated?

The primary advantage of using the ReLU function over other activation functions is that it does not activate all neurons at once. The function returns 0 if it receives any negative input, but for any positive value x, it returns that value back. So it can be written as :

f(x)=max(0,x)f(x)=max(0,x).

How to implement Relu in TensorFlow:

tf.nn.relu(features, name=None)

How to implement Relu in Pytorch:

import torch.nn as nn
import random
relu1 = nn.ReLU(inplace=False)

Logistic (Sigmoid or squashing) function

It is a non-linear function that gives values between [0,1] with an S shape. The mathematical function is the following:

f(x) = 1/(1+e^-x)

How to implement the sigmoid function in TensorFlow:

tf.math.sigmoid(x, name=None)

How to implement the sigmoid function in Pytorch:

m = nn.Sigmoid()
input = torch.randn(2)
output = m(input)

Hyperbolic Tangent (Tanh)

The Hyperbolic Tangent function or tanh, is similar to the sigmoid function. The function outputs values in the range of [-1,1].

The mathematical function of tanh is:

Tanh(x)=tanh(x)=exp(x)+exp(−x)exp(x)−exp(−x)

How to implement the Tanh function in TensorFlow:

tf.nn.tanh(a, name ='tanh')

How to implement the Tanh function in Pytorch:

m = nn.Tanh()
input = torch.randn(2)
output = m(input)

How to choose a function for the hidden layer:

We choose the activation function based on the type of the neural network and try what works best in our case.

For a simple neural network, MLP or CNN, we use Relu. Recurrent neural networks work best with Sigmoid and tanh functions.

Let’s move to other types of activation functions used in the output layer.

Linear: this “no activation” function does exactly what it suggests, it does not apply any changes to the input and returns the sum value directly.

the mathematical formula of the linear function:

z = w₁ x₁ + w₂ x₂+ … + w_n x_n + b

Softmax: Softmax is a mathematical function that transforms a vector of integers into a vector of probabilities, with the probability of each value proportional to the vector’s relative scale.

The mathematical formula of the softmax function:

e^x / sum(e^x)

What activation function to choose?

sigmoid: the sigmoid function is used for the output layer and the hidden layer in a neural network.

What activation function to choose for the output layer:

For regression: Linear activation

Binary classification: sigmoid activation function.

Multiclass classification: softmax activation

Multilabel classification: sigmoind activation.

Tensorflow 2.0 in Deep Learning: All the Basics you Need to Know

Gradient Descent in Deep Learning

End Note:

In this article, we explain what activation functions are and why they are important.

Feel free to ask any questions or give suggestions in the comments section.

Resources:

[1] http://cs231n.github.io/linear-classify/.

https://www.kaggle.com/code/dansbecker/rectified-linear-units-relu-in-deep-learning/notebook

https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/#:~:text=for%20Output%20Layers-,Activation%20Functions,a%20layer%20of%20the%20network.

lamya A.

Hey there! I am the creator of AI Decoder.
I am a data scientist by training and a Ph.D. student in AI. In this blog, I try to explain the knowledge I learn in simple words and help someone somewhere.