Deep Learning Track WiSe 24/25
Deep Learning Track WiSe 24/25
Deep Learning Track WiSe 24/25
  • Welcome to the Deep Learning Track
  • Setup
  • Learning Material
  • Section 1 - The Math
    • Derivatives and Gradients
    • Vectors, Matrices and Tensors
    • The power of matrix computation
    • Exercise - Matrix Computation
  • Section 2 - The Data
    • PyTorch Datasets and Data Loaders
    • Working with Data Tables
    • Exercise - Loading Data from a CSV file
    • Working with Images
    • Exercise - Image Datasets
    • Working with Text
  • Section 3 - Neural Networks
    • Activation Functions
    • Exercise - Activation Functions
    • Exercise - The Softmax Function
    • The Neuron
    • Two type of applications: Regression and Classification
    • Loss Functions
    • Exercise - Regression Loss Functions
    • Exercise - Classification Loss Functions
    • The Gradient Descent Algorithm
    • Exercise - Implementing Gradient Descent
    • Exercise - PyTorch Autograd
    • Exercise - Regression with Neural Networks
    • Exercise - Classification with Neural Networks
    • Playground - Neural Networks
  • Section 4 - Convolutional Neural Networks
    • Convolution
    • Convolutional Neural Networks
    • Classifying handwritten digits
    • Playground - Convolutional Neural Networks
    • Transfer Learning
  • Final Project - Text Classification
  • Further Resources
    • Computer Vision Libraries
    • Image Classification with PyTorch
    • Object Detection with PyTorch
    • Deep AI Explainability
Powered by GitBook
On this page
  1. Section 2 - The Data

Working with Data Tables

In this section we will have a look how you can load tabular data like CSV files and transform them into a 2D tensor.

To load data from a CSV file and build a 2D tensor in PyTorch, you can use the pandas library to read the CSV file and then convert the resulting DataFrame into a PyTorch tensor. Here's an example:

import pandas as pd
import torch

# Load the CSV file using pandas
data_frame = pd.read_csv('mydata.csv')

# Convert the DataFrame to a PyTorch tensor
tensor_data = torch.tensor(data_frame.values)

# Iterate over the rows in the tensor
for row in tensor_data:
    print(row)

In the above code, we first import the necessary libraries: pandas for loading the CSV file and torch for creating the tensor.

Next, we use pd.read_csv('mydata.csv') to read the CSV file and store the data in a DataFrame called data_frame.

Then, we convert the DataFrame to a PyTorch tensor using torch.tensor(data_frame.values). The data_frame.values attribute returns the underlying NumPy array of the DataFrame, which can be directly converted to a PyTorch tensor using torch.tensor().

Finally, we can iterate over the rows in the tensor using a simple for loop and perform any desired operations on each row.

Note that the resulting tensor will have the same data type as the input data in the CSV file. If you need to specify a specific data type for the tensor, you can use the dtype argument in torch.tensor().

PyTorch Dataset classes

Usually we wrap the code that loads data into DataLoader classes. Here is how they work:

To build a data loader class in PyTorch that loads data from a CSV file (or any other file), you can create a custom dataset class by subclassing torch.utils.data.Dataset and override the __len__ and __getitem__ methods. Here's an example:

import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    """
    This class is responsible for loading data and transforming 
    it to a tensor
    """
    def __init__(self, csv_file):
        """
        This is the construcor method of our dataset.
        Here we load the data and initialize other stuff.
        You can basically write any code here you need for your data.
        """
        self.data_frame = pd.read_csv(csv_file)
        self.tensor_data = torch.tensor(self.data_frame.values)

    def __len__(self):
        """
        This function returns the number of items in the dataset
        """
        return len(self.data_frame)

    def __getitem__(self, index):
        """
        This function returns a single item identified by its index
        """
        row = self.tensor_data[index]
        return row

# Create an instance of your dataset
dataset = MyDataset('mydata.csv')

In this example, we define a custom dataset class MyDataset that extends torch.utils.data.Dataset. In the constructor __init__, we read the CSV file using pd.read_csv(csv_file) and convert the data to a PyTorch tensor.

The __len__ method returns the length of the dataset, which is the number of rows in the CSV file. The __getitem__ method retrieves a single row from the tensor based on the provided index.

We then create an instance of MyDataset by passing the CSV file path 'mydata.csv'. Next, we create a data loader using DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4), specifying the dataset instance, batch size, shuffling, and the number of workers for parallel data loading.

# Create a data loader for your dataset
data_loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4)

Finally, we can iterate over the data loader in a for loop, and each iteration will provide a batch of rows from the CSV file.

You can customize the dataset class and the data loader parameters according to your specific requirements, such as adding additional transformations, labels, or other data fields.

# Iterate over the data loader
for batch in data_loader:
    print(batch)

PreviousPyTorch Datasets and Data LoadersNextExercise - Loading Data from a CSV file