Gradient Descent in Deep Learning

One of the fundamental methods of machine learning and deep learning is Gradient descent.

In this article, we will try to explain this concept and how it works, especially in a neural network architecture.

What is gradient descent:

Let’s recall some key elements in deep learning first.

A deep learning model tries to learn a function f(x) that takes an input(x) and predicts an output y.

To find the best hyperparameters of this function f(x), we need to calculate a cost function that measures how wrong our model is in terms of its ability to estimate the relationship between x and y.

This cost function needs to be minimized in order to find the optimum parameters(weights and bias).

For this goal, we need to use an optimization algorithm such as the gradient descent method.

Gradient descent is an iterative optimization algorithm that helps to find the accurate weights and bias of the f(x) function(model) by calculating the local minimum of a cost function. Gradient descent was invented by French mathematician Louis Augustin Cauchy in 1847.

Alternatives to the Gradient Descent Algorithm - DataScienceCentral.com

The gradient enables the model to find the direction to take in terms of weight and bias in order to minimize the cost function. The process begins by setting some initial parameter values. Then iteratively change the parameter values in such a way to reduce the cost function (optimization).

An Introduction to Gradient Descent

Can we train a model without using an optimization algorithm?

We can change randomly weights and biases until we find the cost function is minimized but this is not the best option and will take a long time until we find an accurate combination of weights and biases. That is why we use this optimization algorithm, it makes it easy to find this combination of weights and biases.

Three types of gradient descent:

There are three popular gradient descent types that differ in the amount of data they use when computing the gradients for each learning step.

Batch gradient descent

Stochastic Gradient Descent

Mini Batch Gradient Descent

What is the formula of gradient descent

How Does the Gradient Descent Algorithm Work in Machine Learning?

Check out this video to have a more visual comprehension of gradient descent: