# **PyTorch Masterclass: Part 2 – Deep Learning for Computer Vision with PyTorch**
**Duration: ~60 minutes**
**Hashtags:** #PyTorch #ComputerVision #CNN #DeepLearning #TransferLearning #CIFAR10 #ImageClassification #DataLoaders #Transforms #ResNet #EfficientNet #PyTorchVision #AI #MachineLearning #ConvolutionalNeuralNetworks #DataAugmentation #PretrainedModels
---
## **Table of Contents**
1. [Recap of Part 1: PyTorch Foundations](#recap-of-part-1-pytorch-foundations)
2. [Dataset and DataLoader: Efficient Data Handling](#dataset-and-dataloader-efficient-data-handling)
3. [Transforms: Image Preprocessing and Augmentation](#transforms-image-preprocessing-and-augmentation)
4. [Convolutional Neural Networks (CNNs): Theory and Architecture](#convolutional-neural-networks-cnns-theory-and-architecture)
5. [Building Your First CNN for Image Classification](#building-your-first-cnn-for-image-classification)
6. [Training a CNN on CIFAR-10 Dataset](#training-a-cnn-on-cifar-10-dataset)
7. [Transfer Learning: Leveraging Pretrained Models](#transfer-learning-leveraging-pretrained-models)
8. [Advanced Debugging and Profiling Techniques](#advanced-debugging-and-profiling-techniques)
9. [Quiz 2: Test Your Understanding of Computer Vision with PyTorch](#quiz-2-test-your-understanding-of-computer-vision-with-pytorch)
10. [Summary and What's Next in Part 3](#summary-and-whats-next-in-part-3)
---
## **Recap of Part 1: PyTorch Foundations**
Welcome back to **Part 2** of our PyTorch Masterclass! In **Part 1**, we built a solid foundation in PyTorch by covering:
- The fundamentals of deep learning and why PyTorch is the framework of choice for researchers
- Installation and setup of PyTorch with CUDA support
- Tensors as the core data structure in PyTorch
- Tensor operations, indexing, and GPU acceleration
- The autograd system for automatic differentiation
- Building and training your first neural network
- Loss functions, optimizers, and the training loop
- Debugging with TensorBoard
We built a simple neural network for regression and understood the basic workflow of deep learning in PyTorch. Now, it's time to dive into **computer vision**, one of the most successful applications of deep learning.
In this part, you'll learn how to:
- Efficiently load and preprocess image data
- Build convolutional neural networks (CNNs)
- Train image classifiers on real datasets
- Use transfer learning to boost performance
- Apply advanced debugging techniques
Let's get started!
---
## **Dataset and DataLoader: Efficient Data Handling**
One of the biggest challenges in deep learning is **efficient data handling**. Loading large datasets directly into memory can cause crashes, and processing data sequentially is slow. PyTorch solves this with two key components: **Dataset** and **DataLoader**.
### **Why Dataset and DataLoader?**
Before PyTorch, handling data in deep learning was messy:
- Manual batching
- Inefficient memory usage
- No built-in shuffling
- No parallel loading
PyTorch's `Dataset` and `DataLoader` provide:
- **Memory efficiency**: Load data on-demand
- **Parallelism**: Multi-process data loading
- **Batching**: Automatic batch creation
- **Shuffling**: Randomize data order
- **Collation**: Custom data assembly
### **The Dataset Class**
`torch.utils.data.Dataset` is an **abstract class** representing a dataset. To create your own dataset, inherit from `Dataset` and implement:
- `__len__`: Returns dataset size
- `__getitem__`: Returns a sample by index
```python
from torch.utils.data import Dataset
class CustomDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]
```
### **Built-in Datasets**
PyTorch provides many built-in datasets through **TorchVision**:
```python
from torchvision import datasets
# MNIST (handwritten digits)
mnist_train = datasets.MNIST(
root='./data',
train=True,
download=True,
transform=None
)
# CIFAR-10 (32x32 color images)
cifar_train = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=None
)
# ImageNet (requires manual download)
imagenet_train = datasets.ImageNet(
root='./data/imagenet',
split='train',
transform=None
)
```
### **The DataLoader Class**
`DataLoader` wraps a `Dataset` and provides:
- Automatic batching
- Multi-process data loading
- Shuffling
- Custom collation
```python
from torch.utils.data import DataLoader
# Create DataLoader
train_loader = DataLoader(
dataset=cifar_train,
batch_size=64,
shuffle=True,
num_workers=4, # Parallel processes
pin_memory=True # Faster transfer to GPU
)
```
### **Key DataLoader Parameters**
| Parameter | Description |
|----------|-------------|
| `batch_size` | Number of samples per batch |
| `shuffle` | Randomize data order each epoch |
| `num_workers` | Number of subprocesses for data loading |
| `pin_memory` | Copies tensors to pinned memory for faster GPU transfer |
| `drop_last` | Drop last incomplete batch |
| `sampler` | Custom sampling strategy |
### **Iterating Through DataLoader**
```python
# Training loop
for epoch in range(10):
for batch_idx, (data, target) in enumerate(train_loader):
# data: [batch_size, channels, height, width]
# target: [batch_size]
# Move to GPU if available
data, target = data.to(device), target.to(device)
# Forward pass, backward pass, etc.
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
```
### **Custom Collate Function**
Sometimes you need custom batch assembly:
```python
def custom_collate(batch):
"""Custom collate function for variable-length sequences"""
data = [item[0] for item in batch]
targets = [item[1] for item in batch]
# Pad sequences to same length
data = torch.nn.utils.rnn.pad_sequence(data, batch_first=True)
targets = torch.tensor(targets)
return data, targets
loader = DataLoader(dataset, batch_size=32, collate_fn=custom_collate)
```
### **Practical Example: CIFAR-10 Data Loading**
Let's load CIFAR-10 with proper configuration:
```python
import torch
from torchvision import datasets, transforms
# Define transformations (will cover in next section)
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# Load training set
train_dataset = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=transform
)
# Load test set
test_dataset = datasets.CIFAR10(
root='./data',
train=False,
download=True,
transform=transform
)
# Create data loaders
train_loader = DataLoader(
train_dataset,
batch_size=128,
shuffle=True,
num_workers=4,
pin_memory=True
)
test_loader = DataLoader(
test_dataset,
batch_size=100,
shuffle=False,
num_workers=4,
pin_memory=True
)
# Verify
print(f"Training batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")
print(f"Batch shape: {next(iter(train_loader))[0].shape}")
# Should output: torch.Size([128, 3, 32, 32])
```
> 💡 **Pro Tip**: For large datasets, set `num_workers` to the number of CPU cores available. But don't set it too high—each worker consumes memory.
---
## **Transforms: Image Preprocessing and Augmentation**
Raw image data isn't ready for deep learning models. We need to **preprocess** it and often **augment** it to improve model robustness.
### **Why Transforms?**
Images come in various formats, sizes, and color spaces. To feed them into a neural network, we need to:
- Convert to tensors
- Normalize pixel values
- Resize to consistent dimensions
- Augment to increase dataset size
PyTorch's `torchvision.transforms` provides these capabilities.
### **Basic Image Transforms**
Let's start with essential transforms:
```python
from torchvision import transforms
# Convert PIL Image or numpy array to tensor
transform = transforms.ToTensor()
# Example usage
from PIL import Image
img = Image.open('example.jpg')
tensor_img = transform(img) # Shape: [C, H, W]
```
### **Normalization**
Deep learning models work best with normalized inputs. For images, we typically normalize to mean=0 and std=1.
```python
# Normalize: (x - mean) / std
transform = transforms.Normalize(
mean=[0.485, 0.456, 0.406], # ImageNet stats
std=[0.229, 0.224, 0.225]
)
```
> 💡 **Note**: For CIFAR-10, common normalization is:
> ```python
> transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
> ```
### **Composing Multiple Transforms**
Use `transforms.Compose` to chain multiple operations:
```python
transform = transforms.Compose([
transforms.Resize((224, 224)), # Resize to 224x224
transforms.CenterCrop(200), # Crop to 200x200
transforms.ToTensor(), # Convert to tensor
transforms.Normalize( # Normalize
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
```
### **Data Augmentation**
Data augmentation artificially increases dataset size by applying random transformations. This prevents overfitting and improves generalization.
#### **Common Augmentation Techniques**
```python
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224), # Random crop and resize
transforms.RandomHorizontalFlip(), # 50% chance of flip
transforms.ColorJitter( # Random color changes
brightness=0.2,
contrast=0.2,
saturation=0.2,
hue=0.1
),
transforms.RandomRotation(10), # Random rotation up to 10°
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# No augmentation for validation/test
val_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
```
### **Advanced Augmentation with Albumentations**
For more complex augmentations, use the **Albumentations** library with PyTorch:
```python
import albumentations as A
from albumentations.pytorch import ToTensorV2
# Define augmentation pipeline
transform = A.Compose([
A.HorizontalFlip(p=0.5),
A.Rotate(limit=15, p=0.5),
A.RandomBrightnessContrast(p=0.2),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
ToTensorV2()
])
# Custom dataset using albumentations
class AlbumentationsDataset(Dataset):
def __init__(self, file_paths, labels, transform=None):
self.file_paths = file_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.file_paths)
def __getitem__(self, idx):
image = cv2.imread(self.file_paths[idx])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
if self.transform:
augmented = self.transform(image=image)
image = augmented['image']
return image, self.labels[idx]
```
### **Visualizing Augmentations**
Let's visualize how augmentations transform an image:
```python
import matplotlib.pyplot as plt
from torchvision.utils import make_grid
# Get a batch of augmented images
data, _ = next(iter(train_loader))
# Create grid of images
img_grid = make_grid(data, nrow=8)
img_grid = img_grid.permute(1, 2, 0) # CHW to HWC
# Display
plt.figure(figsize=(12, 6))
plt.imshow(img_grid)
plt.title("Augmented CIFAR-10 Images")
plt.axis('off')
plt.show()
```
This will show a grid of images with different augmentations applied, demonstrating the diversity created by our augmentation pipeline.
### **Custom Transforms**
Sometimes you need a transform that doesn't exist. Create your own!
```python
class RandomCutout:
"""Randomly cut out a portion of the image"""
def __init__(self, size=8, n_holes=1):
self.size = size
self.n_holes = n_holes
def __call__(self, img):
h, w = img.shape[1], img.shape[2]
for _ in range(self.n_holes):
y = torch.randint(0, h, (1,)).item()
x = torch.randint(0, w, (1,)).item()
y1 = max(0, y - self.size // 2)
y2 = min(h, y + self.size // 2)
x1 = max(0, x - self.size // 2)
x2 = min(w, x + self.size // 2)
img[:, y1:y2, x1:x2] = 0
return img
# Usage
transform = transforms.Compose([
transforms.ToTensor(),
RandomCutout(size=8, n_holes=2),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
```
### **Best Practices for Transforms**
1. **Train vs. Validation**: Apply augmentation only to training data
2. **Normalization**: Use dataset-specific mean/std values
3. **Order Matters**: Resize before cropping, normalize last
4. **GPU vs. CPU**: Do heavy augmentation on CPU, simple transforms on GPU
5. **Consistency**: For segmentation, use transforms that affect image and mask together
> 💡 **Pro Tip**: For production, consider using `torchvision.transforms.AutoAugment` or `RandAugment` which automatically find optimal augmentation policies.
---
## **Convolutional Neural Networks (CNNs): Theory and Architecture**
Now that we know how to handle image data, let's dive into **Convolutional Neural Networks (CNNs)**, the workhorse of computer vision.
### **Why CNNs for Images?**
Traditional neural networks (like the one we built in Part 1) have limitations for images:
- **Parameter explosion**: A 1000x1000 image would need 1M input neurons
- **Spatial information loss**: Fully connected layers ignore pixel relationships
- **Translation invariance**: Same object in different locations should be recognized
CNNs solve these problems with:
- **Local connectivity**: Neurons connect to local regions
- **Weight sharing**: Same filter applied across image
- **Hierarchical feature learning**: Simple to complex features
### **Core Components of CNNs**
#### 1. **Convolutional Layer**
The heart of a CNN. Applies filters (kernels) to detect features.
- **Input**: Image tensor [C, H, W]
- **Filter**: Small kernel (e.g., 3x3) with weights
- **Operation**: Slide filter across image, compute dot product
- **Output**: Feature map highlighting where filter matches
Mathematically:
```
(feature_map)_i,j = sum_{m,n} input_{i+m,j+n} * filter_{m,n}
```
In PyTorch:
```python
conv = nn.Conv2d(
in_channels=3, # Input channels (RGB=3)
out_channels=16, # Number of filters
kernel_size=3, # 3x3 filter
stride=1, # Step size
padding=1 # Add padding to keep size
)
output = conv(input_tensor)
```
#### 2. **Activation Function**
Adds non-linearity. **ReLU** is most common:
```python
relu = nn.ReLU()
output = relu(conv_output)
```
Why ReLU?
- Simple: max(0, x)
- Avoids vanishing gradients
- Computationally efficient
#### 3. **Pooling Layer**
Reduces spatial dimensions while keeping important features.
**Max Pooling** (most common):
- Takes maximum value in each window
- Preserves strongest activations
- Provides translation invariance
```python
pool = nn.MaxPool2d(kernel_size=2, stride=2)
output = pool(relu_output)
```
**Average Pooling**:
- Takes average value in each window
- Smoother feature maps
- Less common than max pooling
#### 4. **Batch Normalization**
Stabilizes and accelerates training:
```python
bn = nn.BatchNorm2d(num_features=16)
output = bn(conv_output)
```
Benefits:
- Reduces internal covariate shift
- Allows higher learning rates
- Acts as regularizer
#### 5. **Dropout**
Regularization technique to prevent overfitting:
```python
dropout = nn.Dropout(p=0.5)
output = dropout(feature_map)
```
### **CNN Architecture Patterns**
Most CNNs follow this pattern:
```
INPUT -> [[CONV -> RELU] * N -> POOL] * M -> [FC -> RELU] * K -> FC
```
Where:
- N = number of conv layers before pooling (usually 1-3)
- M = number of convolutional blocks (usually 3-5)
- K = number of fully connected layers (usually 0-2)
### **Classic CNN Architectures**
#### LeNet-5 (1998)
One of the first successful CNNs (for handwritten digits):
```
INPUT(32x32) -> C1(6@28x28) -> S2 -> C3(16@10x10) -> S4 -> C5 -> F6 -> OUTPUT
```
- C = Convolutional layer
- S = Subsampling (pooling)
- Numbers indicate feature map count
#### AlexNet (2012)
Revolutionized computer vision (won ImageNet challenge):
```
INPUT(227x227) -> CONV(96,11x11,s=4) -> RELU -> POOL ->
CONV(256,5x5) -> RELU -> POOL ->
CONV(384,3x3) -> RELU ->
CONV(384,3x3) -> RELU ->
CONV(256,3x3) -> RELU -> POOL ->
FC(4096) -> RELU -> DROPOUT ->
FC(4096) -> RELU -> DROPOUT ->
FC(1000) -> SOFTMAX
```
Key innovations:
- ReLU activation
- Overlapping pooling
- Local response normalization
- Data augmentation
- Dropout regularization
- GPU implementation
#### VGGNet (2014)
Simplified AlexNet with consistent 3x3 convolutions:
```
INPUT -> [CONV(64) -> CONV(64) -> POOL] ->
[CONV(128) -> CONV(128) -> POOL] ->
[CONV(256) -> CONV(256) -> CONV(256) -> POOL] ->
[CONV(512) -> CONV(512) -> CONV(512) -> POOL] ->
[CONV(512) -> CONV(512) -> CONV(512) -> POOL] ->
FC(4096) -> FC(4096) -> FC(1000) -> SOFTMAX
```
Two main variants:
- VGG-16 (16 weight layers)
- VGG-19 (19 weight layers)
#### ResNet (2015)
Introduced **residual connections** to solve vanishing gradients in deep networks:
```
INPUT -> CONV -> BN -> RELU ->
[CONV -> BN -> RELU -> CONV -> BN] + INPUT -> RELU ->
[CONV -> BN -> RELU -> CONV -> BN] + INPUT -> RELU -> ...
```
Key innovation: **Skip connections** that allow gradients to flow directly through the network.
ResNet-50 (50 layers) became the new standard.
#### EfficientNet (2019)
Balanced network depth, width, and resolution:
```
INPUT -> STAGE1 -> STAGE2 -> ... -> STAGE8 -> POOL -> FC
```
Used **compound scaling** to uniformly scale all dimensions.
### **Visualizing CNN Features**
Let's see what CNNs learn:
- **Early layers**: Detect edges, colors, textures
- **Middle layers**: Detect shapes, patterns
- **Late layers**: Detect complex objects, parts

*Visualization of features learned at different CNN layers (Source: CS231n)*
### **How Convolution Works: Step-by-Step**
Let's walk through a simple 2D convolution:
**Input** (3x3 image):
```
1 2 3
4 5 6
7 8 9
```
**Filter** (2x2):
```
1 0
0 -1
```
**Stride = 1, No Padding**
Step 1: Position filter at top-left
```
[1 2] 3
[4 5] 6
7 8 9
```
Dot product: (1*1) + (2*0) + (4*0) + (5*-1) = 1 + 0 + 0 - 5 = -4
Step 2: Slide right
```
1 [2 3]
4 [5 6]
7 8 9
```
Dot product: (2*1) + (3*0) + (5*0) + (6*-1) = 2 + 0 + 0 - 6 = -4
Step 3: Move down, reset left
```
1 2 3
[4 5] 6
[7 8] 9
```
Dot product: (4*1) + (5*0) + (7*0) + (8*-1) = 4 + 0 + 0 - 8 = -4
Step 4: Slide right
```
1 2 3
4 [5 6]
7 [8 9]
```
Dot product: (5*1) + (6*0) + (8*0) + (9*-1) = 5 + 0 + 0 - 9 = -4
**Output Feature Map** (2x2):
```
-4 -4
-4 -4
```
This simple example shows how a filter can detect specific patterns (in this case, a diagonal edge detector).
### **Practical Considerations for CNN Design**
1. **Filter Size**: 3x3 is standard (good balance of coverage and parameters)
2. **Number of Filters**: Typically doubles after each pooling (64→128→256→512)
3. **Padding**: Use "same" padding (padding=1 for 3x3) to maintain spatial dimensions
4. **Stride**: Usually 1 for conv, 2 for pooling
5. **Depth**: Deeper networks learn more complex features but harder to train
6. **Normalization**: Always use batch norm after conv layers
7. **Activation**: ReLU is standard; leaky ReLU for GANs
---
## **Building Your First CNN for Image Classification**
Now that we understand CNN theory, let's build one in PyTorch!
### **Step 1: Define the CNN Architecture**
We'll create a simple CNN for CIFAR-10:
```python
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(SimpleCNN, self).__init__()
# First convolutional block
self.conv1 = nn.Conv2d(
in_channels=3,
out_channels=32,
kernel_size=3,
padding=1
)
self.bn1 = nn.BatchNorm2d(32)
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
# Second convolutional block
self.conv2 = nn.Conv2d(
in_channels=32,
out_channels=64,
kernel_size=3,
padding=1
)
self.bn2 = nn.BatchNorm2d(64)
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
# Third convolutional block
self.conv3 = nn.Conv2d(
in_channels=64,
out_channels=128,
kernel_size=3,
padding=1
)
self.bn3 = nn.BatchNorm2d(128)
# Fully connected layers
self.fc1 = nn.Linear(128 * 8 * 8, 512)
self.dropout = nn.Dropout(0.5)
self.fc2 = nn.Linear(512, num_classes)
def forward(self, x):
# Block 1
x = self.conv1(x)
x = self.bn1(x)
x = F.relu(x)
x = self.pool1(x)
# Block 2
x = self.conv2(x)
x = self.bn2(x)
x = F.relu(x)
x = self.pool2(x)
# Block 3
x = self.conv3(x)
x = self.bn3(x)
x = F.relu(x)
# Flatten
x = x.view(-1, 128 * 8 * 8)
# Fully connected
x = self.fc1(x)
x = F.relu(x)
x = self.dropout(x)
x = self.fc2(x)
return x
```
### **Architecture Explanation**
Let's trace the shape through the network (for CIFAR-10's 32x32 images):
1. **Input**: [batch, 3, 32, 32]
2. **Conv1 + ReLU + Pool1**:
- Conv1: [batch, 32, 32, 32] (3x3 conv, padding=1)
- Pool1: [batch, 32, 16, 16] (2x2 max pool, stride=2)
3. **Conv2 + ReLU + Pool2**:
- Conv2: [batch, 64, 16, 16]
- Pool2: [batch, 64, 8, 8]
4. **Conv3 + ReLU**:
- Conv3: [batch, 128, 8, 8]
5. **Flatten**: [batch, 128*8*8] = [batch, 8192]
6. **FC1**: [batch, 512]
7. **FC2**: [batch, 10] (10 classes for CIFAR-10)
### **Step 2: Initialize and Move to Device**
```python
# Create model
model = SimpleCNN(num_classes=10)
# Move to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
# Print model summary (requires torchsummary)
from torchsummary import summary
summary(model, input_size=(3, 32, 32))
```
The summary will show:
- Each layer's output shape
- Number of parameters
- Total parameters (~500K for this model)
### **Step 3: Define Loss Function and Optimizer**
```python
# Cross-entropy loss for classification
criterion = nn.CrossEntropyLoss()
# Adam optimizer with weight decay
optimizer = torch.optim.Adam(
model.parameters(),
lr=0.001,
weight_decay=1e-5 # L2 regularization
)
# Learning rate scheduler
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer,
mode='max',
factor=0.5,
patience=5,
verbose=True
)
```
### **Step 4: Training and Validation Functions**
```python
def train_epoch(model, dataloader, criterion, optimizer, device):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, targets in dataloader:
inputs, targets = inputs.to(device), targets.to(device)
# Forward pass
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
# Backward pass
loss.backward()
optimizer.step()
# Statistics
running_loss += loss.item() * inputs.size(0)
_, predicted = outputs.max(1)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
epoch_loss = running_loss / total
epoch_acc = correct / total
return epoch_loss, epoch_acc
def validate(model, dataloader, criterion, device):
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for inputs, targets in dataloader:
inputs, targets = inputs.to(device), targets.to(device)
outputs = model(inputs)
loss = criterion(outputs, targets)
running_loss += loss.item() * inputs.size(0)
_, predicted = outputs.max(1)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
epoch_loss = running_loss / total
epoch_acc = correct / total
return epoch_loss, epoch_acc
```
### **Step 5: Full Training Loop**
```python
# Training configuration
num_epochs = 50
best_acc = 0.0
history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
for epoch in range(num_epochs):
# Train
train_loss, train_acc = train_epoch(
model, train_loader, criterion, optimizer, device
)
# Validate
val_loss, val_acc = validate(model, test_loader, criterion, device)
# Learning rate scheduling
scheduler.step(val_acc)
# Save best model
if val_acc > best_acc:
best_acc = val_acc
torch.save(model.state_dict(), 'best_model.pth')
# Record history
history['train_loss'].append(train_loss)
history['train_acc'].append(train_acc)
history['val_loss'].append(val_loss)
history['val_acc'].append(val_acc)
# Print progress
print(f"Epoch {epoch+1}/{num_epochs}")
print(f"Train Loss: {train_loss:.4f} Acc: {train_acc:.4f}")
print(f"Val Loss: {val_loss:.4f} Acc: {val_acc:.4f} (Best: {best_acc:.4f})")
print("-" * 60)
```
### **Step 6: Visualize Training History**
```python
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 5))
# Plot loss
plt.subplot(1, 2, 1)
plt.plot(history['train_loss'], label='Train Loss')
plt.plot(history['val_loss'], label='Val Loss')
plt.title('Loss')
plt.xlabel('Epoch')
plt.legend()
# Plot accuracy
plt.subplot(1, 2, 2)
plt.plot(history['train_acc'], label='Train Acc')
plt.plot(history['val_acc'], label='Val Acc')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.tight_layout()
plt.show()
```
### **Step 7: Evaluate on Test Set**
```python
# Load best model
model.load_state_dict(torch.load('best_model.pth'))
# Final evaluation
test_loss, test_acc = validate(model, test_loader, criterion, device)
print(f"Test Accuracy: {test_acc:.4f}")
# Confusion matrix
from sklearn.metrics import confusion_matrix
import seaborn as sns
model.eval()
all_preds = []
all_targets = []
with torch.no_grad():
for inputs, targets in test_loader:
inputs = inputs.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
all_preds.extend(preds.cpu().numpy())
all_targets.extend(targets.numpy())
cm = confusion_matrix(all_targets, all_preds)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
```
### **Step 8: Making Predictions on New Images**
```python
def predict_image(image_path, model, transform, device, class_names):
"""Predict class for a single image"""
# Load and transform image
img = Image.open(image_path)
img_tensor = transform(img).unsqueeze(0) # Add batch dimension
img_tensor = img_tensor.to(device)
# Predict
model.eval()
with torch.no_grad():
output = model(img_tensor)
_, pred = torch.max(output, 1)
return class_names[pred.item()]
# Example usage
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
# Assuming you have a test image
prediction = predict_image('test_image.jpg', model, val_transform, device, class_names)
print(f"Predicted class: {prediction}")
```
### **Common CNN Pitfalls and How to Avoid Them**
1. **Overfitting**
- *Symptoms*: High training accuracy, low validation accuracy
- *Solutions*:
- Add dropout layers
- Increase data augmentation
- Use weight decay
- Reduce model complexity
2. **Underfitting**
- *Symptoms*: Low training and validation accuracy
- *Solutions*:
- Increase model capacity (more layers/filters)
- Train longer
- Reduce regularization
- Check data preprocessing
3. **Vanishing Gradients**
- *Symptoms*: Very slow training, gradients approaching zero
- *Solutions*:
- Use batch normalization
- Use skip connections (ResNet-style)
- Use ReLU or leaky ReLU
- Reduce network depth
4. **Exploding Gradients**
- *Symptoms*: NaN losses, sudden performance drops
- *Solutions*:
- Gradient clipping
- Lower learning rate
- Proper weight initialization
5. **Class Imbalance**
- *Symptoms*: Model biased toward majority classes
- *Solutions*:
- Class weights in loss function
- Oversampling minority classes
- Undersampling majority classes
---
## **Training a CNN on CIFAR-10 Dataset**
Let's put everything together and train a CNN on the **CIFAR-10 dataset**.
### **What is CIFAR-10?**
CIFAR-10 is a benchmark dataset in computer vision containing:
- 60,000 32x32 color images
- 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
- 50,000 training images, 10,000 test images
- Classes are balanced (6,000 images per class)

### **Complete CIFAR-10 Training Script**
Here's a complete script to train a CNN on CIFAR-10:
```python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import time
import os
# Configuration
config = {
'batch_size': 128,
'num_epochs': 50,
'learning_rate': 0.001,
'weight_decay': 1e-5,
'device': torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
'model_save_path': 'cifar10_model.pth'
}
print(f"Using device: {config['device']}")
# Data augmentation and normalization
train_transform = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])
test_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])
# Load datasets
train_dataset = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=train_transform
)
test_dataset = datasets.CIFAR10(
root='./data',
train=False,
download=True,
transform=test_transform
)
# Create data loaders
train_loader = DataLoader(
train_dataset,
batch_size=config['batch_size'],
shuffle=True,
num_workers=4,
pin_memory=True
)
test_loader = DataLoader(
test_dataset,
batch_size=config['batch_size'],
shuffle=False,
num_workers=4,
pin_memory=True
)
# Define the CNN model
class CIFAR10CNN(nn.Module):
def __init__(self, num_classes=10):
super(CIFAR10CNN, self).__init__()
self.features = nn.Sequential(
# Block 1
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(0.25),
# Block 2
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(0.25),
# Block 3
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(0.25),
)
self.classifier = nn.Sequential(
nn.Linear(256 * 4 * 4, 512),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(512, num_classes)
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
# Initialize model
model = CIFAR10CNN(num_classes=10).to(config['device'])
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(
model.parameters(),
lr=config['learning_rate'],
weight_decay=config['weight_decay']
)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
optimizer,
mode='max',
factor=0.5,
patience=5,
verbose=True
)
# Training function
def train(model, train_loader, criterion, optimizer, device):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, targets in train_loader:
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
_, predicted = outputs.max(1)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
epoch_loss = running_loss / total
epoch_acc = correct / total
return epoch_loss, epoch_acc
# Validation function
def validate(model, test_loader, criterion, device):
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for inputs, targets in test_loader:
inputs, targets = inputs.to(device), targets.to(device)
outputs = model(inputs)
loss = criterion(outputs, targets)
running_loss += loss.item() * inputs.size(0)
_, predicted = outputs.max(1)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
epoch_loss = running_loss / total
epoch_acc = correct / total
return epoch_loss, epoch_acc
# Training loop
best_acc = 0.0
train_losses, train_accs = [], []
val_losses, val_accs = [], []
start_time = time.time()
for epoch in range(config['num_epochs']):
# Train
train_loss, train_acc = train(model, train_loader, criterion, optimizer, config['device'])
# Validate
val_loss, val_acc = validate(model, test_loader, criterion, config['device'])
# Update learning rate
scheduler.step(val_acc)
# Save best model
if val_acc > best_acc:
best_acc = val_acc
torch.save(model.state_dict(), config['model_save_path'])
# Record metrics
train_losses.append(train_loss)
train_accs.append(train_acc)
val_losses.append(val_loss)
val_accs.append(val_acc)
# Print progress
print(f"Epoch {epoch+1:2d}/{config['num_epochs']} | "
f"Time: {time.time()-start_time:.1f}s | "
f"Train Loss: {train_loss:.4f} Acc: {train_acc:.4f} | "
f"Val Loss: {val_loss:.4f} Acc: {val_acc:.4f} (Best: {best_acc:.4f})")
# Save final metrics
metrics = {
'train_loss': train_losses,
'train_acc': train_accs,
'val_loss': val_losses,
'val_acc': val_accs
}
torch.save(metrics, 'training_metrics.pth')
# Plot results
plt.figure(figsize=(14, 5))
# Loss plot
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Validation Loss')
plt.title('Loss Curve')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
# Accuracy plot
plt.subplot(1, 2, 2)
plt.plot(train_accs, label='Train Accuracy')
plt.plot(val_accs, label='Validation Accuracy')
plt.title('Accuracy Curve')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.tight_layout()
plt.savefig('training_curves.png')
plt.show()
# Final evaluation
print(f"\nTraining completed in {time.time()-start_time:.0f} seconds")
print(f"Best validation accuracy: {best_acc:.4f}")
# Load best model for final evaluation
model.load_state_dict(torch.load(config['model_save_path']))
final_loss, final_acc = validate(model, test_loader, criterion, config['device'])
print(f"Final test accuracy: {final_acc:.4f}")
```
### **Expected Results**
With this setup, you should achieve:
- **~85-88% test accuracy** after 50 epochs
- Training time: ~30-45 minutes on a modern GPU
- Clear convergence with no severe overfitting
### **Improving Performance**
To get even better results:
1. **More complex architecture**:
- Add more convolutional blocks
- Use residual connections
- Try different activation functions
2. **Advanced augmentation**:
```python
train_transform = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
transforms.RandomErasing(p=0.5, scale=(0.02, 0.2), ratio=(0.3, 3.3))
])
```
3. **Learning rate scheduling**:
```python
scheduler = optim.lr_scheduler.OneCycleLR(
optimizer,
max_lr=0.01,
steps_per_epoch=len(train_loader),
epochs=config['num_epochs']
)
```
4. **Mixup augmentation**:
```python
def mixup_data(x, y, alpha=1.0):
if alpha > 0:
lam = np.random.beta(alpha, alpha)
else:
lam = 1
batch_size = x.size()[0]
index = torch.randperm(batch_size).to(x.device)
mixed_x = lam * x + (1 - lam) * x[index, :]
y_a, y_b = y, y[index]
return mixed_x, y_a, y_b, lam
# In training loop
inputs, targets = inputs.to(device), targets.to(device)
inputs, targets_a, targets_b, lam = mixup_data(inputs, targets)
outputs = model(inputs)
loss = lam * criterion(outputs, targets_a) + (1 - lam) * criterion(outputs, targets_b)
```
5. **Label smoothing**:
```python
class LabelSmoothingLoss(nn.Module):
def __init__(self, classes, smoothing=0.1):
super(LabelSmoothingLoss, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
self.cls = classes
def forward(self, pred, target):
pred = pred.log_softmax(dim=-1)
with torch.no_grad():
true_dist = torch.zeros_like(pred)
true_dist.fill_(self.smoothing / (self.cls - 1))
true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
return torch.mean(torch.sum(-true_dist * pred, dim=-1))
criterion = LabelSmoothingLoss(classes=10, smoothing=0.1)
```
With these enhancements, you can reach **~92-93% test accuracy** on CIFAR-10, which is quite good for a model trained from scratch.
---
## **Transfer Learning: Leveraging Pretrained Models**
Training CNNs from scratch requires massive datasets and compute resources. **Transfer learning** solves this by using models pretrained on large datasets like ImageNet.
### **What is Transfer Learning?**
Transfer learning takes a model trained on one task and adapts it to a new task. In computer vision, this means:
1. Take a model pretrained on ImageNet (1.2M images, 1000 classes)
2. Replace the final classification layer
3. Fine-tune on your target dataset
Benefits:
- **Faster training**: Start with good feature extractors
- **Better performance**: Especially with small datasets
- **Less data needed**: Works with hundreds instead of thousands of images
### **How Transfer Learning Works**
1. **Feature extraction**: Use pretrained model as fixed feature extractor
- Freeze all layers except the classifier
- Train only the new classifier head
2. **Fine-tuning**: Update some pretrained layers
- Unfreeze some layers
- Train with lower learning rate

*Transfer learning approaches (Source: PyTorch documentation)*
### **PyTorch Models Zoo**
PyTorch provides many pretrained models through **torchvision.models**:
```python
from torchvision import models
# List available models
print(dir(models))
# Load pretrained ResNet-18
model = models.resnet18(pretrained=True)
# Load pretrained EfficientNet-B0
model = models.efficientnet_b0(pretrained=True)
# Load pretrained Vision Transformer
model = models.vit_b_16(pretrained=True)
```
Available models include:
- **ResNet**: resnet18, resnet34, resnet50, resnet101, resnet152
- **EfficientNet**: efficientnet_b0 to efficientnet_b7
- **Vision Transformers**: vit_b_16, vit_b_32, vit_l_16
- **MobileNet**: mobilenet_v2, mobilenet_v3_small, mobilenet_v3_large
- **DenseNet**: densenet121, densenet161, densenet169, densenet201
- **AlexNet, VGG, GoogLeNet, Inception**
### **Step-by-Step: Transfer Learning on CIFAR-10**
Let's use ResNet-18 for CIFAR-10:
```python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
# Configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
num_classes = 10
batch_size = 128
num_epochs = 20
feature_extract = True # True for feature extraction, False for fine-tuning
# Data transforms
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
test_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# Load datasets
train_dataset = datasets.CIFAR10('./data', train=True, download=True, transform=train_transform)
test_dataset = datasets.CIFAR10('./data', train=False, download=True, transform=test_transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4)
# Load pretrained model
model = models.resnet18(pretrained=True)
# Freeze all parameters (feature extraction mode)
if feature_extract:
for param in model.parameters():
param.requires_grad = False
# Replace the final fully connected layer
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes)
# Send model to device
model = model.to(device)
# Gather parameters to optimize
params_to_update = model.parameters()
if feature_extract:
params_to_update = []
for name, param in model.named_parameters():
if param.requires_grad == True:
params_to_update.append(param)
print(f"Updating {name}")
# Setup optimizer and loss function
optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss()
# Training loop (same as before)
# ... [same training/validation functions as in previous section] ...
```
### **Feature Extraction vs. Fine-tuning**
#### **Feature Extraction (feature_extract=True)**
- **How it works**: Freeze all pretrained layers, train only new classifier
- **When to use**: Small dataset, similar to ImageNet
- **Advantages**:
- Very fast training
- No risk of overwriting good features
- **Disadvantages**:
- May not adapt well to very different tasks
#### **Fine-tuning (feature_extract=False)**
- **How it works**: Unfreeze some layers, train with low learning rate
- **When to use**: Larger dataset, somewhat different from ImageNet
- **Advantages**:
- Better adaptation to target task
- Higher potential accuracy
- **Disadvantages**:
- Slower training
- Risk of overfitting
### **Advanced Fine-tuning Strategies**
#### **Layer-wise Learning Rates**
Different layers may need different learning rates:
```python
# Group parameters by layer
optimizer_grouped_parameters = [
{'params': model.conv1.parameters(), 'lr': 1e-5},
{'params': model.layer1.parameters(), 'lr': 5e-5},
{'params': model.layer2.parameters(), 'lr': 1e-4},
{'params': model.layer3.parameters(), 'lr': 5e-4},
{'params': model.layer4.parameters(), 'lr': 1e-3},
{'params': model.fc.parameters(), 'lr': 1e-2},
]
optimizer = optim.Adam(optimizer_grouped_parameters)
```
#### **Gradual Unfreezing**
Unfreeze layers progressively during training:
```python
def unfreeze_layers(model, num_layers):
"""Unfreeze the last num_layers of the model"""
# ResNet-specific; adapt for other architectures
layers = [model.layer4, model.layer3, model.layer2, model.layer1]
for i in range(min(num_layers, len(layers))):
for param in layers[i].parameters():
param.requires_grad = True
# Start with only classifier trainable
for epoch in range(num_epochs):
if epoch == 5:
unfreeze_layers(model, 1) # Unfreeze last block
optimizer = optim.SGD(model.parameters(), lr=1e-4)
elif epoch == 10:
unfreeze_layers(model, 2) # Unfreeze second last block
optimizer = optim.SGD(model.parameters(), lr=1e-5)
```
#### **Discriminative Fine-tuning**
Used in Universal Language Models (ULMFiT), also works for vision:
```python
# Learning rates decrease by factor for earlier layers
base_lr = 1e-3
layers = [
model.fc,
model.layer4,
model.layer3,
model.layer2,
model.layer1,
model.conv1
]
params = []
for i, layer in enumerate(layers):
params.append({'params': layer.parameters(), 'lr': base_lr / (2 ** i)})
optimizer = optim.Adam(params)
```
### **Expected Results with Transfer Learning**
With ResNet-18 on CIFAR-10:
- **Feature extraction**: ~88-90% test accuracy
- **Fine-tuning**: ~92-94% test accuracy
This is significantly better than training from scratch (~85-88%), with less training time.
### **When NOT to Use Transfer Learning**
Transfer learning isn't always the best approach:
1. **Very different domains**:
- Medical images vs. natural images
- Satellite imagery vs. everyday photos
2. **Extremely large target dataset**:
- If you have millions of task-specific images
- Training from scratch may yield better results
3. **Specialized architectures**:
- Some tasks need custom architectures
- Example: Object detection, segmentation
In these cases, consider:
- **Domain-specific pretraining**: Pretrain on similar domain
- **Multi-task learning**: Train on multiple related tasks
- **Self-supervised learning**: Train without labels (e.g., SimCLR, MoCo)
---
## **Advanced Debugging and Profiling Techniques**
Even with PyTorch's great debugging tools, deep learning models can be tricky to debug. Here are advanced techniques to identify and fix issues.
### **1. Verifying Data Pipeline**
Most bugs come from data issues, not model issues.
#### **Check Data Distribution**
```python
# Check class distribution
from collections import Counter
train_labels = [label for _, label in train_dataset]
test_labels = [label for _, label in test_dataset]
print("Train distribution:", Counter(train_labels))
print("Test distribution:", Counter(test_labels))
# Should be balanced (5000 per class for CIFAR-10 train)
```
#### **Visualize Raw and Transformed Images**
```python
def show_batch(sample_batched, title=None):
"""Show image batch"""
images, labels = sample_batched
batch_size = len(images)
im_size = images.size(2)
grid = make_grid(images)
plt.imshow(grid.numpy().transpose((1, 2, 0)))
if title is not None:
plt.title(title)
plt.axis('off')
plt.show()
# Get a batch of training data
sample_batch = next(iter(train_loader))
show_batch(sample_batch, 'Training Batch')
# Get a batch of test data
sample_batch = next(iter(test_loader))
show_batch(sample_batch, 'Test Batch')
```
#### **Check for Data Leakage**
```python
# Are training and test sets properly separated?
train_paths = set(train_dataset.imgs)
test_paths = set(test_dataset.imgs)
common = train_paths & test_paths
print(f"Common files between train and test: {len(common)}")
# Should be zero!
```
### **2. Gradient Checking**
Verify your gradients are correct:
```python
from torch.autograd import gradcheck
# Test with small input
input = (torch.randn(20, 3, 32, 32, dtype=torch.double, requires_grad=True),)
test = gradcheck(model, input, eps=1e-6, atol=1e-4)
print(f"Gradient check passed: {test}")
```
### **3. Numerical Stability Checks**
Watch for NaNs and infinities:
```python
def check_tensor(tensor, name="tensor"):
"""Check for NaNs and Infs"""
if torch.isnan(tensor).any():
print(f"Warning: {name} contains NaNs")
if torch.isinf(tensor).any():
print(f"Warning: {name} contains Infs")
# In training loop
output = model(data)
check_tensor(output, "model output")
loss = criterion(output, target)
check_tensor(loss, "loss")
```
### **4. Learning Rate Scheduling Analysis**
Visualize how learning rate changes:
```python
lrs = []
for epoch in range(50):
optimizer.step()
lrs.append(optimizer.param_groups[0]['lr'])
scheduler.step()
plt.plot(lrs)
plt.title('Learning Rate Schedule')
plt.xlabel('Epoch')
plt.ylabel('Learning Rate')
plt.show()
```
### **5. Weight and Activation Monitoring**
Track distributions of weights and activations:
```python
def add_hist_hooks(model):
"""Add hooks to record weight and activation histograms"""
histograms = {}
def hook_fn(name):
def hook(module, input, output):
histograms[f"{name}.output"] = output.detach()
if hasattr(module, 'weight'):
histograms[f"{name}.weight"] = module.weight.detach()
if hasattr(module, 'bias') and module.bias is not None:
histograms[f"{name}.bias"] = module.bias.detach()
return hook
for name, module in model.named_modules():
module.register_forward_hook(hook_fn(name))
return histograms
# Usage
histograms = add_hist_hooks(model)
# During training, collect histograms periodically
all_histograms = []
for epoch in range(0, num_epochs, 5):
# Train for some steps
# ...
# Record histograms
all_histograms.append({
k: v.cpu().numpy() for k, v in histograms.items()
})
# Plot weight distribution over time
plt.figure(figsize=(12, 8))
for i, hist in enumerate(all_histograms):
plt.subplot(2, 3, i+1)
plt.hist(hist['features.0.weight'].flatten(), bins=50)
plt.title(f'Epoch {i*5}')
plt.tight_layout()
plt.show()
```
### **6. TensorBoard for Advanced Monitoring**
Beyond basic loss/accuracy, track:
```python
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs/cifar10_experiment')
# Track weight histograms
for name, param in model.named_parameters():
writer.add_histogram(name, param, epoch)
# Track gradient norms
total_norm = 0
for p in model.parameters():
if p.grad is not None:
param_norm = p.grad.data.norm(2)
total_norm += param_norm.item() ** 2
total_norm = total_norm ** (1. / 2)
writer.add_scalar('grad_norm', total_norm, epoch)
# Track learning rate
writer.add_scalar('learning_rate', optimizer.param_groups[0]['lr'], epoch)
# Track misclassified examples
if epoch % 10 == 0:
# Get some misclassified examples
misclassified = []
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
wrong = pred.ne(target.view_as(pred))
if wrong.any():
misclassified.append((
data[wrong.squeeze()],
target[wrong.squeeze()],
pred[wrong.squeeze()]
))
if len(misclassified) > 5:
break
# Add to TensorBoard
if misclassified:
images, labels, preds = zip(*misclassified)
images = torch.cat(images)[:16]
labels = torch.cat(labels)[:16]
preds = torch.cat(preds)[:16]
grid = make_grid(images, nrow=4)
writer.add_image(
'Misclassified Examples',
grid,
epoch,
dataformats='CHW'
)
# Add text descriptions
class_names = train_dataset.classes
text = "\n".join([
f"True: {class_names[labels[i]]}, Pred: {class_names[preds[i][0]]}"
for i in range(len(labels))
])
writer.add_text('Misclassified Labels', text, epoch)
```
### **7. Profiling with PyTorch Profiler**
Identify bottlenecks in your code:
```python
from torch.profiler import profile, record_function, ProfilerActivity
with profile(
activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
schedule=torch.profiler.schedule(
wait=1,
warmup=1,
active=3,
repeat=2
),
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log'),
record_shapes=True,
profile_memory=True,
with_stack=True
) as prof:
for step, (inputs, targets) in enumerate(train_loader):
if step >= (1 + 1 + 3) * 2:
break
inputs = inputs.to(device)
targets = targets.to(device)
with record_function("forward"):
outputs = model(inputs)
loss = criterion(outputs, targets)
with record_function("backward"):
optimizer.zero_grad()
loss.backward()
optimizer.step()
prof.step()
# Print profiling results
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
```
This will show:
- Time spent in each operation
- CPU vs. GPU time
- Memory usage
- Call stack for bottlenecks
### **8. Common Issues and Solutions**
#### **Issue: Training loss doesn't decrease**
- **Check**: Learning rate too low
- **Fix**: Increase learning rate, use learning rate finder
#### **Issue: Validation loss higher than training loss**
- **Check**: Overfitting
- **Fix**: Add dropout, reduce model size, increase augmentation
#### **Issue: NaN loss**
- **Check**: Exploding gradients, invalid inputs
- **Fix**: Gradient clipping, check data preprocessing
#### **Issue: Slow training**
- **Check**: Data loading bottleneck
- **Fix**: Increase `num_workers`, use pinned memory
#### **Issue: Poor generalization**
- **Check**: Data leakage, insufficient augmentation
- **Fix**: Verify data split, add more augmentation
---
## **Quiz 2: Test Your Understanding of Computer Vision with PyTorch**
**1. What is the primary purpose of the DataLoader class in PyTorch?**
A) To define the neural network architecture
B) To efficiently load and preprocess data in batches
C) To calculate loss functions
D) To implement backpropagation
**2. Which transform is typically NOT applied during validation/testing?**
A) Resize
B) CenterCrop
C) RandomHorizontalFlip
D) Normalize
**3. In a convolutional layer with in_channels=3, out_channels=16, and kernel_size=3, how many parameters does the layer have (ignoring bias)?**
A) 432
B) 144
C) 160
D) 192
**4. What is the main benefit of using batch normalization in CNNs?**
A) Reduces the number of parameters
B) Makes training more stable and allows higher learning rates
C) Replaces the need for activation functions
D) Automatically selects the best kernel size
**5. In transfer learning, what does "feature extraction" mode typically involve?**
A) Training all layers with a high learning rate
B) Freezing all pretrained layers and only training the new classifier
C) Randomly initializing all weights
D) Using only the first convolutional layer
**6. Which of the following is NOT a common data augmentation technique for images?**
A) RandomHorizontalFlip
B) ColorJitter
C) RandomErasing
D) BatchNormalization
**7. What does the "padding" parameter in a convolutional layer control?**
A) The stride of the filter
B) The number of output channels
C) Whether to add zeros around the input to maintain spatial dimensions
D) The activation function used after convolution
**8. In the CIFAR-10 dataset, what do the dimensions of a single image tensor represent?**
A) [height, width, channels]
B) [channels, height, width]
C) [batch, height, width, channels]
D) [height, channels, width]
**9. What is the primary purpose of the torchsummary library?**
A) To visualize training curves
B) To provide a summary of model architecture and parameters
C) To implement data augmentation
D) To convert models to ONNX format
**10. When using transfer learning with ResNet on CIFAR-10, why do we need to replace the final fully connected layer?**
A) To match the number of output classes (10 instead of 1000)
B) To reduce the model size
C) To enable data augmentation
D) To improve numerical stability
**11. Which technique would most directly address overfitting in a CNN?**
A) Increasing the learning rate
B) Adding dropout layers
C) Removing batch normalization
D) Reducing data augmentation
**12. What is the main advantage of using mixed precision training (float16)?**
A) Higher model accuracy
B) Reduced memory usage and faster training on compatible GPUs
C) Better handling of small gradients
D) Automatic learning rate scheduling
**13. In PyTorch, how do you move a model to GPU if available?**
A) model.cuda()
B) model.to('cuda')
C) model.to(device) where device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
D) All of the above
**14. What does the "num_workers" parameter in DataLoader control?**
A) The batch size
B) The number of subprocesses for data loading
C) The learning rate
D) The number of output classes
**15. Which visualization technique would help identify if your model is overfitting?**
A) Plotting training and validation loss curves
B) Visualizing the model architecture
C) Checking the number of parameters
D) Monitoring GPU memory usage
---
**Answers:**
1. B - DataLoader efficiently loads and preprocesses data in batches
2. C - RandomHorizontalFlip is augmentation, not used in validation
3. B - (3*3*3)*16 = 9*3*16 = 432 weights + 16 biases = 448 total, but ignoring bias: 432
4. B - Batch norm stabilizes training and allows higher learning rates
5. B - Feature extraction freezes pretrained layers and trains only the new classifier
6. D - BatchNormalization is a layer, not an augmentation technique
7. C - Padding adds zeros around input to maintain spatial dimensions
8. B - PyTorch uses [channels, height, width] format
9. B - torchsummary shows model architecture and parameter count
10. A - CIFAR-10 has 10 classes vs. ImageNet's 1000
11. B - Dropout is a direct regularization technique for overfitting
12. B - Mixed precision reduces memory and speeds up training on compatible GPUs
13. D - All methods move model to GPU (though C is most portable)
14. B - num_workers sets the number of subprocesses for data loading
15. A - Training/validation loss curves clearly show overfitting
---
## **Summary and What's Next in Part 3**
In this **comprehensive Part 2** of our PyTorch Masterclass, we've covered:
- **Dataset and DataLoader**: Efficient data handling for deep learning
- **Transforms**: Image preprocessing and augmentation techniques
- **CNN Architecture**: Theory behind convolutional neural networks
- **Building CNNs**: Creating and training convolutional networks from scratch
- **CIFAR-10 Training**: Complete workflow for image classification
- **Transfer Learning**: Leveraging pretrained models for better performance
- **Advanced Debugging**: Profiling, monitoring, and troubleshooting techniques
You now have the skills to:
- Build efficient data pipelines for image data
- Design and train convolutional neural networks
- Apply transfer learning to boost performance
- Diagnose and fix common deep learning issues
### **What's Coming in Part 3?**
In **Part 3**, we'll dive into **sequence modeling with Recurrent Neural Networks (RNNs)** and **Long Short-Term Memory (LSTM)** networks:
- **Text data processing**: Tokenization, embeddings, and vocabulary creation
- **RNN architecture**: Understanding hidden states and sequence processing
- **LSTM and GRU**: Advanced recurrent units for long-term dependencies
- **Building text classifiers**: Sentiment analysis with RNNs
- **Sequence-to-sequence models**: Introduction to machine translation
- **Attention mechanisms**: The foundation of modern NLP
- **Transformer architecture**: Self-attention and positional encoding
We'll build a **sentiment analysis model** on real text data and explore the architecture that powers models like BERT and GPT.
👉 **Stay tuned for Part 3: Deep Learning for Natural Language Processing with PyTorch**
---
**Hashtags:** #PyTorch #ComputerVision #CNN #DeepLearning #TransferLearning #CIFAR10 #ImageClassification #DataLoaders #Transforms #ResNet #EfficientNet #PyTorchVision #AI #MachineLearning #ConvolutionalNeuralNetworks #DataAugmentation #PretrainedModels #TensorBoard #Debugging #Profiling #BatchNormalization #MixedPrecision #DataParallel #DistributedTraining #ImageNet #AlexNet #VGG #ResNet #MobileNet #VisionTransformer #PyTorchTutorial #DeepLearningCourse #AIEngineering #ComputerVisionEngineer