PyTorch for DL & ML

# PyTorch for DL & ML ## Resources 0. https://www.learnpytorch.io/00_pytorch_fundamentals/ 1. https://www.learnpytorch.io/01_pytorch_workflow/ 2. https://www.learnpytorch.io/02_pytorch_classification/ ## Fundamentals ### Neural Network ![](https://i.imgur.com/GeNYIhZ.jpg) ### Tensors ![](https://i.imgur.com/5GRr2sI.jpg) ### see what hardware we got ```py= !nvidia-smi ``` ### Tensors operations 1. **create tensor** * `torch.tensor` default datatype: `torch.float32` ```python= torch.tensor(size=()) torch.rand(size=()) torch.arange(start, end, step) # create a same shape tensor with input tensor torch.zeros_like(tensor) # Float 32 tensor float_32_tensor = torch.tensor([3.0, 6.0, 9.0], dtype = None, # what datatype is the tensor (e.g. float32 or float16) device = None, # what device is your tensor on requires_grad = False) # whether or not to track gradients with this tensors operations float_32_tensor ``` 2. **Reproducible tensor** ```python= torch.manual_seed(RANDOM_SEED) ``` 3. **Manipulating tensors** * Addition * Subtraction * Multiplication (element-wise) * Division * Matrix multiplication - using `torch.matmul` 4. **Tensors aggregation** * min - `torch.min(tensor)` or `tensor.min()` * max * find the position that has the minimum value - `argmin()` * find the position that has the maximum value - `argmax()` 5. **Reshaping, stacking, squeezing and unsqueezing tensors** * Reshaping - reshapes an input tensor to a defined shape * View - Return a view of an input tensor of certain shape but keep the same memory as the original tensor * Stacking - combine multiple tensors on top of each other (vstack) or side by side (hstack) * **Squeeze** - removes all `1` dimensions from a tensor * **Unsqueeze** - add a `1` dimension to a target tensor * Permute - Return a view of the input with dimensions permuted(swapped) in a certain way ### Pytorch tensors <-> NumPy * NumPy -> PyTorch tensor - `torch.from_numpy(ndarray)` ```python= # NumPy array to tensor import torch import numpy as np array = np.arange(1., 8.) tensor = torch.from_numpy(array) array, tensor ``` ``` (array([1., 2., 3., 4., 5., 6., 7.]), tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64)) ``` * Warning: when converting from numpy -> pytorch, pytorch reflects numpy's default datatype of float64 unless specified otherwise * Pytorch tensor -> NumPy - `tensor.numpy()` ```python= # Tensor to NumPy array tensor = torch.ones(7) numpy_tensor = tensor.numpy() tensor, numpy_tensor ``` ``` (tensor([1., 1., 1., 1., 1., 1., 1.]), array([1., 1., 1., 1., 1., 1., 1.], dtype=float32)) ``` ### Running tensors and PyTorch objects on the GPUs(and making faster computations) ```python= device = "cuda" if torch.cuda.is_available() else "cpu" ``` * Putting tensor into GPU ```python= # Create a tensor (default on the CPU) tensor = torch.tensor([1, 2, 3]) # Tensor not on GPU print(tensor, tensor.device) ``` ``` tensor([1, 2, 3]) cpu ``` ```python= # Move tensor to GPU (if available) tensor_on_gpu = tensor.to(device) tensor_on_gpu ``` ``` tensor([1, 2, 3], device='cuda:0') ``` ## 3 Big Errors 1. Tensors not right datatype 2. Tensors not right space 3. Tensors not on the right device **Solution** 1. Tensors not right datatype - to do get datatype from a tensor, can use `tensor.dtype` 2. Tensors not right space - to get shape from a tensor, can use `tensor.shape` 3. Tensors not on the right device - to get device from a tensor, can use `tensor.device` ## PyTorch Workflow ![](https://i.imgur.com/MlpQG8p.jpg) 1. data (prepare and load) 2. build model 3. fitting the model to data (training) 4. making predictions and evaluating a model (inference) 5. saving and loading a model 6. putting it all together * Basic import ```python= import torch from torch import nn #nn contains all of PyTorch's building blocks for neural networks import matplotlib.pyplot as plt # Check PyTorch version torch.__version__ ``` ### Data (prepare and load) * Turn data into tensors and create train and test splits ```python= # use train_test_split to create splits # Split data into train and test sets from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, # 20% test, 80% train random_state=42) # make the random split reproducible len(X_train), len(X_test), len(y_train), len(y_test) ``` ### Build model What our model does: * Start with random values (weight & bias) * Look at training data and adjust the random values to better represent(or get closer to) the ideal values(the weight & bias values we used to create the data) How does it do so? Through two main algorithms: 1. Gradient descent 2. Backpropagation --- Model Building essential * `torch.nn` - contains all of the buildings for computational graphs (a neural networks can be consider a computational graph) * `torch.nn.Parameter` - what parameters should our model try and learn, often a PyTorch layer from torch.nn will set these for us * `torch.nn.Module` - The base class for all neural network modules, if you subclass it, you should overwrite forward() * `torch.optim` - this is where the optimizers in PyTorch live, they will help with gradient descent * def `forward()` - All `nn.Module` subclasses require you to overwrite `forward()`, this method defines what happens in the forward computation * `torch.utils.data.Dataset` - Represent a map between key (label) and sample (features) pairs of your data. Such as images and their associated labels * `torch.utils.data.DataLoader` - Create a Python iterable over torch Dataset (allows you to iterate over your data) --- **Basic steps** 1. Setting up device agnostic code (so our model can run on CPU or GPU if it's available). 2. Constructing a model by subclassing nn.Module. 3. Defining a loss function and optimizer. 4. Creating a training loop (this'll be in the next section). ### Checking the contents of the model ```python= model.parameters() model.state_dict() ``` ### Making prediction Using `torch.inference_mode()` To check our model's predictive ppower, let's see how well it predicts `y_test` based on `X_test` When we pass data through our model, it's going to run it through the `forward()` method ### Train model Things we need to train: * **Loss function:** A function to measure how wrong your model's predictions are to the ideal outputs, lower is better. * **Optimizer:** Takes into account the loss of a model and adjusts the model's parameters (e.g. weight & bias in our case) to improve the loss function. -https://pytorch.org/docs/stable/optim.html#module-torch.optim * Inside the optimizer you'll often have to set two parameters: * `params` - the model parameters you'd like to optimize * `lr` (learning rate) - the learning rate is a hyperparameter that defines how big/small the optimizer changes the parameters with each step (a small `lr` results in small changes, a large `lr` results in large changes) ```python= # Setup a loss function loss_fn = nn.L1Loss() # Setup an optimizer (stochastic gradient descent) optimizer = torch.optim.SGD(params = model_0.parameters(), lr = 0.01) # lr = learning rate = possibly the most important hyperparameter you can set ``` And specifically for PyTorch, we need: * A training loop * A testing loop --- Building a training loop ```python= # 1. for epoch in range(epochs): # 2. model.train() # 3. y_preds = model(X_train) # 4. train_loss = loss_fn(y_preds, y_train) # 5. optimizer.zero_grad() # 6. train_loss.backward() # 7. optimizer.step() ``` --- Building a testing loop ```python= ### Testing model.eval() # turns off different settings in the model not needed for evaluation/testing (dropout/Batch norm layer) with torch.inference_mode(): # turns off gradient tracking & a couple more things behind the scenes # 1. Do the forward pass test_pred = model_0(X_test) # 2. Calculate the loss test_loss = loss_fn(test_pred, y_test) # Print out what's happening print(f"Epoch: {epoch} | Loss: {loss} | Test loss: {test_loss}") print(model_0.state_dict()) ``` ### Improving a model ![](https://i.imgur.com/skLR88f.png) ### Saving the model 1. `torch.save()` - allows you to save a PyTorch object in Python's pickle format 2. `torch.load()` - allows you to load a saved PyTorch object 3. `torch.nn.Module.load_state_dict()` - this allows to load a model's saved state dictionary ```python= # Saving our PyTorch model from pathlib import Path # 1. Create models directory MODEL_PATH = Path("models") MODEL_PATH.mkdir(parents = True, exist_ok = True) # 2. Create model save path MODEL_NAME = "01_pytorch_workflow_model_0_linear.pth" MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME # 3. Save the model state dict print(f"Saving model to : {MODEL_SAVE_PATH}") torch.save(obj = model_0.state_dict(), f = MODEL_SAVE_PATH) ``` ### Load the model Since we saved our model's `state_dict()` rather than the entire model, we'll create a new instance of our model class and load the saved `state_dict()` into that. ```python= # To load in a saved state_dict we have to instantiate a new instance of our model class loaded_model_0 = LinearRegressionModel() # Load the saved state_dict of model_0 (this will update the new instance with updated parameters) loaded_model_0.load_state_dict(torch.load(f = MODEL_SAVE_PATH)) ``` ## Classification ### Architecture of a classification neural network ![](https://i.imgur.com/qyDdlJx.png) **Note** Input layer shape in computer vision should be the color channels (i.e., 3 for RGB) ### Loss function and optimizer ![](https://i.imgur.com/dbr17Wx.png) **Note** ```python= # Create a loss function # loss_fn = nn.BCELoss() # BCELoss = no sigmoid built-in loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid built-in # Create an optimizer optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1) ``` * The documentation for `torch.nn.BCEWithLogitsLoss() `states that it's more numerically stable than using `torch.nn.BCELoss()` after a `nn.Sigmoid` layer. * Logits: Logits interpreted to be the unnormalised (or not-yet normalised) predictions (or outputs) of a model. These can give results, but we don't normally stop with logits, because interpreting their raw values is not easy. ### Going from raw model outputs to predicted labels (logits -> prediction probabilities -> prediction labels) 1. logits -> prediction probabilities: through activation functions * The use of the sigmoid activation function is often only for binary classification logits. For multi-class classification, we'll be looking at using the softmax activation function * And the use of the sigmoid activation function is not required when passing our model's raw outputs to the `nn.BCEWithLogitsLoss` (the "logits" in logits loss is because it works on the model's raw logits output), this is because it has a sigmoid function built-in. ```python= # Part of training loop # Binary classification # 1. Forward pass (model outputs raw logits) y_logits = model_0(X_train).squeeze() # squeeze to remove extra `1` dimensions, this won't work unless model and data are on same device y_pred = torch.round(torch.sigmoid(y_logits)) # turn logits -> pred probs -> pred labls # 2. Calculate loss/accuracy # loss = loss_fn(torch.sigmoid(y_logits), # Using nn.BCELoss you need torch.sigmoid() # y_train) loss = loss_fn(y_logits, # Using nn.BCEWithLogitsLoss works with raw logits y_train) acc = accuracy_fn(y_true=y_train, y_pred=y_pred) ``` 2. prediction probabilities -> prediction labels: can just round them in this case (binary classification) **For multiclass classification** ```python= # 1. Forward pass y_logits = model_4(X_blob_train) # model outputs raw logits y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1) # go from logits -> prediction probabilities -> prediction labels # print(y_logits) # 2. Calculate loss and accuracy loss = loss_fn(y_logits, y_blob_train) acc = accuracy_fn(y_true=y_blob_train, y_pred=y_pred) ``` ### Classification evaluation metrics ![](https://i.imgur.com/19KwQkH.png) ## Computer Vision ### Computer vision libraries ![](https://i.imgur.com/LxjCcAJ.png) ``` # Some common used libraries # Import PyTorch import torch from torch import nn # Import torchvision import torchvision from torchvision import datasets from torchvision.transforms import ToTensor # Import matplotlib for visualization import matplotlib.pyplot as plt # Check versions # Note: your PyTorch version shouldn't be lower than 1.10.0 and torchvision version shouldn't be lower than 0.11 print(f"PyTorch version: {torch.__version__}\ntorchvision version: {torchvision.__version__}") ``` ### Getting a Dataset * FashionMNIST in here ```python= # Setup training data train_data = datasets.FashionMNIST( root="data", # where to download data to? train=True, # get training data download=True, # download data if it doesn't exist on disk transform=ToTensor(), # images come as PIL format, we want to turn into Torch tensors target_transform=None # you can transform labels as well ) # Setup testing data test_data = datasets.FashionMNIST( root="data", train=False, # get test data download=True, transform=ToTensor() ) ``` To download it, we provide the following parameters: * `root`: str - which folder do you want to download the data to? * `train`: Bool - do you want the training or test split? * `download`: Bool - should the data be downloaded? * `transform`: torchvision.transforms - what transformations would you like to do on the data? * `target_transform` - you can transform the targets (labels) if you like too. Many other datasets in `torchvision` have these parameter options. ### Input shape of image * NCHW & NHWC N: Number of images C: Color channels H, W: Height, Weight For example if you have a batch_size=32, your tensor shape may be [32, 1, 28, 28] PyTorch generally accepts NCHW (channels first) as the default for many operators. However, **PyTorch also explains that `NHWC (channels last)` performs better and is considered best practice**. ### Prepare DataLoader It turns a large Dataset into a Python iterable of smaller chunks. These smaller chunks are called **batches** or **mini-batches** and can be set by the `batch_size` parameter. It's more computationally efficient. ```python= from torch.utils.data import DataLoader # Setup the batch size hyperparameter BATCH_SIZE = 32 # Turn datasets into iterables (batches) train_dataloader = DataLoader(train_data, # dataset to turn into iterable batch_size=BATCH_SIZE, # how many samples per batch? shuffle=True # shuffle data every epoch? ) test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False # don't necessarily have to shuffle the testing data ) # Let's check out what we've created print(f"Dataloaders: {train_dataloader, test_dataloader}") print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}") print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}") ``` ``` Dataloaders: (<torch.utils.data.dataloader.DataLoader object at 0x7ff03be2c700>, <torch.utils.data.dataloader.DataLoader object at 0x7ff03be2c2e0>) Length of train dataloader: 1875 batches of 32 Length of test dataloader: 313 batches of 32 ``` ```python= # Check out what's inside the training dataloader train_features_batch, train_labels_batch = next(iter(train_dataloader)) train_features_batch.shape, train_labels_batch.shape ``` ``` (torch.Size([32, 1, 28, 28]), torch.Size([32])) ``` ### Creating a training loop and training a model on batches of data 1. Loop through epochs. 2. Loop through training batches, perform training steps, calculate the train loss per batch. 3. Loop through testing batches, perform testing steps, calculate the test loss per batch. 4. Print out what's happening. 5. Time it all (for fun) ```python= # Import tqdm for progress bar from tqdm.auto import tqdm # Set the seed and start the timer torch.manual_seed(42) train_time_start_on_cpu = timer() # Set the number of epochs (we'll keep this small for faster training times) epochs = 3 # Create training and testing loop for epoch in tqdm(range(epochs)): print(f"Epoch: {epoch}\n-------") ### Training train_loss = 0 # Add a loop to loop through training batches for batch, (X, y) in enumerate(train_dataloader): model_0.train() # 1. Forward pass y_pred = model_0(X) # 2. Calculate loss (per batch) loss = loss_fn(y_pred, y) train_loss += loss # accumulatively add up the loss per epoch # 3. Optimizer zero grad optimizer.zero_grad() # 4. Loss backward loss.backward() # 5. Optimizer step optimizer.step() # Print out how many samples have been seen if batch % 400 == 0: print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples") # Divide total train loss by length of train dataloader (average loss per batch per epoch) train_loss /= len(train_dataloader) ### Testing # Setup variables for accumulatively adding up loss and accuracy test_loss, test_acc = 0, 0 model_0.eval() with torch.inference_mode(): for X, y in test_dataloader: # 1. Forward pass test_pred = model_0(X) # 2. Calculate loss (accumatively) test_loss += loss_fn(test_pred, y) # accumulatively add up the loss per epoch # 3. Calculate accuracy (preds need to be same as y_true) test_acc += accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1)) # Calculations on test metrics need to happen inside torch.inference_mode() # Divide total test loss by length of test dataloader (per batch) test_loss /= len(test_dataloader) # Divide total accuracy by length of test dataloader (per batch) test_acc /= len(test_dataloader) ## Print out what's happening print(f"\nTrain loss: {train_loss:.5f} | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\n") # Calculate training time train_time_end_on_cpu = timer() total_train_time_model_0 = print_train_time(start=train_time_start_on_cpu, end=train_time_end_on_cpu, device=str(next(model_0.parameters()).device)) ``` ### Make prediction ```python= torch.manual_seed(42) def eval_model(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, accuracy_fn): """Returns a dictionary containing the results of model predicting on data_loader. Args: model (torch.nn.Module): A PyTorch model capable of making predictions on data_loader. data_loader (torch.utils.data.DataLoader): The target dataset to predict on. loss_fn (torch.nn.Module): The loss function of model. accuracy_fn: An accuracy function to compare the models predictions to the truth labels. Returns: (dict): Results of model making predictions on data_loader. """ loss, acc = 0, 0 model.eval() with torch.inference_mode(): for X, y in data_loader: # Make predictions with the model y_pred = model(X) # Accumulate the loss and accuracy values per batch loss += loss_fn(y_pred, y) acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1)) # For accuracy, need the prediction labels (logits -> pred_prob -> pred_labels) # Scale loss and acc to find the average loss/acc per batch loss /= len(data_loader) acc /= len(data_loader) return {"model_name": model.__class__.__name__, # only works when model was created with a class "model_loss": loss.item(), "model_acc": acc} # Calculate model 0 results on test dataset model_0_results = eval_model(model=model_0, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn ) model_0_results ``` ``` {'model_name': 'FashionMNISTModelV0', 'model_loss': 0.47663894295692444, 'model_acc': 83.42651757188499} ``` ### Functionizing training and test loops ```python= def train_step(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, optimizer: torch.optim.Optimizer, accuracy_fn, device: torch.device = device): train_loss, train_acc = 0, 0 model.to(device) for batch, (X, y) in enumerate(data_loader): # Send data to GPU X, y = X.to(device), y.to(device) # 1. Forward pass y_pred = model(X) # 2. Calculate loss loss = loss_fn(y_pred, y) train_loss += loss train_acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1)) # Go from logits -> pred labels # 3. Optimizer zero grad optimizer.zero_grad() # 4. Loss backward loss.backward() # 5. Optimizer step optimizer.step() # Calculate loss and accuracy per epoch and print out what's happening train_loss /= len(data_loader) train_acc /= len(data_loader) print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%") def test_step(data_loader: torch.utils.data.DataLoader, model: torch.nn.Module, loss_fn: torch.nn.Module, accuracy_fn, device: torch.device = device): test_loss, test_acc = 0, 0 model.to(device) model.eval() # put model in eval mode # Turn on inference context manager with torch.inference_mode(): for X, y in data_loader: # Send data to GPU X, y = X.to(device), y.to(device) # 1. Forward pass test_pred = model(X) # 2. Calculate loss and accuracy test_loss += loss_fn(test_pred, y) test_acc += accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1) # Go from logits -> pred labels ) # Adjust metrics and print out test_loss /= len(data_loader) test_acc /= len(data_loader) print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%\n") ``` ### Making a confusion matrix for further prediction evaluation ```python= # See if torchmetrics exists, if not, install it try: import torchmetrics, mlxtend print(f"mlxtend version: {mlxtend.__version__}") assert int(mlxtend.__version__.split(".")[1]) >= 19, "mlxtend verison should be 0.19.0 or higher" except: !pip install -q torchmetrics -U mlxtend # <- Note: If you're using Google Colab, this may require restarting the runtime import torchmetrics, mlxtend print(f"mlxtend version: {mlxtend.__version__}") ``` ```python= # Import mlxtend upgraded version import mlxtend print(mlxtend.__version__) assert int(mlxtend.__version__.split(".")[1]) >= 19 # should be version 0.19.0 or higher ``` ```python= from torchmetrics import ConfusionMatrix from mlxtend.plotting import plot_confusion_matrix # 2. Setup confusion matrix instance and compare predictions to targets confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass') confmat_tensor = confmat(preds=y_pred_tensor, target=test_data.targets) # 3. Plot the confusion matrix fig, ax = plot_confusion_matrix( conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy class_names=class_names, # turn the row and column labels into class names figsize=(10, 7) ); ``` ![](https://i.imgur.com/48vGbit.png)