Deep Learning Track WiSe 24/25
Deep Learning Track WiSe 24/25
Deep Learning Track WiSe 24/25
  • Welcome to the Deep Learning Track
  • Setup
  • Learning Material
  • Section 1 - The Math
    • Derivatives and Gradients
    • Vectors, Matrices and Tensors
    • The power of matrix computation
    • Exercise - Matrix Computation
  • Section 2 - The Data
    • PyTorch Datasets and Data Loaders
    • Working with Data Tables
    • Exercise - Loading Data from a CSV file
    • Working with Images
    • Exercise - Image Datasets
    • Working with Text
  • Section 3 - Neural Networks
    • Activation Functions
    • Exercise - Activation Functions
    • Exercise - The Softmax Function
    • The Neuron
    • Two type of applications: Regression and Classification
    • Loss Functions
    • Exercise - Regression Loss Functions
    • Exercise - Classification Loss Functions
    • The Gradient Descent Algorithm
    • Exercise - Implementing Gradient Descent
    • Exercise - PyTorch Autograd
    • Exercise - Regression with Neural Networks
    • Exercise - Classification with Neural Networks
    • Playground - Neural Networks
  • Section 4 - Convolutional Neural Networks
    • Convolution
    • Convolutional Neural Networks
    • Classifying handwritten digits
    • Playground - Convolutional Neural Networks
    • Transfer Learning
  • Final Project - Text Classification
  • Further Resources
    • Computer Vision Libraries
    • Image Classification with PyTorch
    • Object Detection with PyTorch
    • Deep AI Explainability
Powered by GitBook
On this page
  1. Section 3 - Neural Networks

The Gradient Descent Algorithm

In this section we will introduce the fundamental algorithm that makes neural networks learn anything: the gradient descent algorithm.

PreviousExercise - Classification Loss FunctionsNextExercise - Implementing Gradient Descent

The gradient descent algorithm is an iterative optimization algorithm used to minimize a function iteratively. It is commonly employed in machine learning and optimization problems to find the minimum of a cost or objective function.

Here's a step-by-step explanation of the gradient descent algorithm:

  1. Initialization: Choose an initial point in the parameter space. This could be done randomly or based on prior knowledge.

  2. Compute the Gradient: Calculate the gradient (or partial derivatives) of the cost or objective function with respect to each parameter. The gradient represents the direction of the steepest ascent in the function space.

  3. Update the Parameters: Update the values of the parameters by taking a step proportional to the negative gradient. The updating equation is typically of the form: θ_new = θ_old - learning_rate * gradient, where θ_new and θ_old represent the new and old parameter values, respectively, and the learning_rate (often denoted as alpha) is a hyperparameter that determines the step size.

Repeat Steps 2 and 3: Repeat steps 2 and 3 until convergence or a stopping criterion is met. The stopping criterion can be based on the number of iterations, reaching a certain threshold for the cost function, or other criteria specific to the problem.

The algorithm iteratively adjusts the parameter values in the direction of steepest descent, allowing it to gradually converge towards the minimum of the cost function. By following the negative gradient, the algorithm moves in the direction of decreasing function values, which corresponds to descending down the slope of the function.

The learning rate is a crucial hyperparameter in gradient descent. It determines the step size taken in each iteration. A large learning rate may result in overshooting the minimum, causing the algorithm to diverge. On the other hand, a small learning rate may slow down convergence.

There are variations of gradient descent, such as batch gradient descent, stochastic gradient descent, and mini-batch gradient descent, which differ in how they update the parameters using the gradient. These variations are often used to balance computational efficiency and convergence accuracy.

Overall, the gradient descent algorithm provides an iterative approach to optimize a function by iteratively updating parameters in the direction of steepest descent. It is a fundamental optimization technique used in various machine learning algorithms, such as linear regression, logistic regression, and neural networks.

If you are interested in a very detailed explanation of the Gradient Descent Algorithm, sometimes also called Backpropagation Algoriithm, I would highly recommend you this video on YouTubbe:

Also there is a very good video on YouTube by Emergent Garden which shows multiple examples how a neural network learns by feeding in with data and what difficulties can occur during the learning process. Also neural networks are compared to other approximation techniques like Taylor Series.