Deep Learning Track WiSe 24/25
Deep Learning Track WiSe 24/25
Deep Learning Track WiSe 24/25
  • Welcome to the Deep Learning Track
  • Setup
  • Learning Material
  • Section 1 - The Math
    • Derivatives and Gradients
    • Vectors, Matrices and Tensors
    • The power of matrix computation
    • Exercise - Matrix Computation
  • Section 2 - The Data
    • PyTorch Datasets and Data Loaders
    • Working with Data Tables
    • Exercise - Loading Data from a CSV file
    • Working with Images
    • Exercise - Image Datasets
    • Working with Text
  • Section 3 - Neural Networks
    • Activation Functions
    • Exercise - Activation Functions
    • Exercise - The Softmax Function
    • The Neuron
    • Two type of applications: Regression and Classification
    • Loss Functions
    • Exercise - Regression Loss Functions
    • Exercise - Classification Loss Functions
    • The Gradient Descent Algorithm
    • Exercise - Implementing Gradient Descent
    • Exercise - PyTorch Autograd
    • Exercise - Regression with Neural Networks
    • Exercise - Classification with Neural Networks
    • Playground - Neural Networks
  • Section 4 - Convolutional Neural Networks
    • Convolution
    • Convolutional Neural Networks
    • Classifying handwritten digits
    • Playground - Convolutional Neural Networks
    • Transfer Learning
  • Final Project - Text Classification
  • Further Resources
    • Computer Vision Libraries
    • Image Classification with PyTorch
    • Object Detection with PyTorch
    • Deep AI Explainability
Powered by GitBook
On this page
  • What are Data Loaders?
  • How you can build a Data Loader
  1. Section 2 - The Data

PyTorch Datasets and Data Loaders

In this section you will learn how you can load data using PyTorch's Dataset and DataLoader classes.

What are Data Loaders?

In PyTorch, data loaders are a utility that helps you load and preprocess data for training or inference efficiently. They are particularly useful when working with large datasets that cannot fit entirely into memory.

Data loaders are part of the torch.utils.data module in PyTorch. They provide an interface to iterate over a dataset and perform various operations such as shuffling, batching, and parallel data loading for improved performance.

How you can build a Data Loader

To use data loaders, you typically follow these steps:

  1. Dataset Preparation: First, you need to create a dataset object that represents your data. PyTorch provides the torch.utils.data.Dataset class, which you can extend to define your custom dataset. This involves implementing the __len__ method to return the size of the dataset and the __getitem__ method to retrieve a sample from the dataset given an index.

  2. Data Transformation: If you need to apply any preprocessing or data transformation operations, such as normalization or data augmentation, you can use the torchvision.transforms module or create custom transformation functions.

  3. Creating a Data Loader: Once you have a dataset, you can create a data loader using the torch.utils.data.DataLoader class. The data loader takes the dataset object and additional parameters such as batch size, shuffling, and parallel loading. For example:

from torch.utils.data import DataLoader

dataset = YourCustomDataset(...)
data_loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4)

Here, batch_size determines the number of samples per batch, shuffle=True shuffles the data at the beginning of each epoch, and num_workers specifies the number of subprocesses to use for data loading (which can speed up loading if you have multiple CPU cores).

  1. Iterating Over the Data Loader: Once the data loader is created, you can iterate over it in your training loop. Each iteration will provide a batch of data that you can use for training or inference. For example:

for batch in data_loader:
    inputs, labels = batch
    # Perform training/inference using the batch of data

In the above code, inputs and labels represent a batch of input data and corresponding labels, respectively.

Data loaders simplify the process of handling large datasets, batching, and parallel loading, allowing you to focus on developing your models and training routines. They provide an efficient way to load and preprocess data in PyTorch.

PreviousSection 2 - The DataNextWorking with Data Tables