**PyTorch Masterclass: Part 2 – Deep Learning for Computer Vision with PyTorch**

# **PyTorch Masterclass: Part 2 – Deep Learning for Computer Vision with PyTorch** **Duration: ~60 minutes** **Hashtags:** #PyTorch #ComputerVision #CNN #DeepLearning #TransferLearning #CIFAR10 #ImageClassification #DataLoaders #Transforms #ResNet #EfficientNet #PyTorchVision #AI #MachineLearning #ConvolutionalNeuralNetworks #DataAugmentation #PretrainedModels --- ## **Table of Contents** 1. [Recap of Part 1: PyTorch Foundations](#recap-of-part-1-pytorch-foundations) 2. [Dataset and DataLoader: Efficient Data Handling](#dataset-and-dataloader-efficient-data-handling) 3. [Transforms: Image Preprocessing and Augmentation](#transforms-image-preprocessing-and-augmentation) 4. [Convolutional Neural Networks (CNNs): Theory and Architecture](#convolutional-neural-networks-cnns-theory-and-architecture) 5. [Building Your First CNN for Image Classification](#building-your-first-cnn-for-image-classification) 6. [Training a CNN on CIFAR-10 Dataset](#training-a-cnn-on-cifar-10-dataset) 7. [Transfer Learning: Leveraging Pretrained Models](#transfer-learning-leveraging-pretrained-models) 8. [Advanced Debugging and Profiling Techniques](#advanced-debugging-and-profiling-techniques) 9. [Quiz 2: Test Your Understanding of Computer Vision with PyTorch](#quiz-2-test-your-understanding-of-computer-vision-with-pytorch) 10. [Summary and What's Next in Part 3](#summary-and-whats-next-in-part-3) --- ## **Recap of Part 1: PyTorch Foundations** Welcome back to **Part 2** of our PyTorch Masterclass! In **Part 1**, we built a solid foundation in PyTorch by covering: - The fundamentals of deep learning and why PyTorch is the framework of choice for researchers - Installation and setup of PyTorch with CUDA support - Tensors as the core data structure in PyTorch - Tensor operations, indexing, and GPU acceleration - The autograd system for automatic differentiation - Building and training your first neural network - Loss functions, optimizers, and the training loop - Debugging with TensorBoard We built a simple neural network for regression and understood the basic workflow of deep learning in PyTorch. Now, it's time to dive into **computer vision**, one of the most successful applications of deep learning. In this part, you'll learn how to: - Efficiently load and preprocess image data - Build convolutional neural networks (CNNs) - Train image classifiers on real datasets - Use transfer learning to boost performance - Apply advanced debugging techniques Let's get started! --- ## **Dataset and DataLoader: Efficient Data Handling** One of the biggest challenges in deep learning is **efficient data handling**. Loading large datasets directly into memory can cause crashes, and processing data sequentially is slow. PyTorch solves this with two key components: **Dataset** and **DataLoader**. ### **Why Dataset and DataLoader?** Before PyTorch, handling data in deep learning was messy: - Manual batching - Inefficient memory usage - No built-in shuffling - No parallel loading PyTorch's `Dataset` and `DataLoader` provide: - **Memory efficiency**: Load data on-demand - **Parallelism**: Multi-process data loading - **Batching**: Automatic batch creation - **Shuffling**: Randomize data order - **Collation**: Custom data assembly ### **The Dataset Class** `torch.utils.data.Dataset` is an **abstract class** representing a dataset. To create your own dataset, inherit from `Dataset` and implement: - `__len__`: Returns dataset size - `__getitem__`: Returns a sample by index ```python from torch.utils.data import Dataset class CustomDataset(Dataset): def __init__(self, data, labels): self.data = data self.labels = labels def __len__(self): return len(self.data) def __getitem__(self, idx): return self.data[idx], self.labels[idx] ``` ### **Built-in Datasets** PyTorch provides many built-in datasets through **TorchVision**: ```python from torchvision import datasets # MNIST (handwritten digits) mnist_train = datasets.MNIST( root='./data', train=True, download=True, transform=None ) # CIFAR-10 (32x32 color images) cifar_train = datasets.CIFAR10( root='./data', train=True, download=True, transform=None ) # ImageNet (requires manual download) imagenet_train = datasets.ImageNet( root='./data/imagenet', split='train', transform=None ) ``` ### **The DataLoader Class** `DataLoader` wraps a `Dataset` and provides: - Automatic batching - Multi-process data loading - Shuffling - Custom collation ```python from torch.utils.data import DataLoader # Create DataLoader train_loader = DataLoader( dataset=cifar_train, batch_size=64, shuffle=True, num_workers=4, # Parallel processes pin_memory=True # Faster transfer to GPU ) ``` ### **Key DataLoader Parameters** | Parameter | Description | |----------|-------------| | `batch_size` | Number of samples per batch | | `shuffle` | Randomize data order each epoch | | `num_workers` | Number of subprocesses for data loading | | `pin_memory` | Copies tensors to pinned memory for faster GPU transfer | | `drop_last` | Drop last incomplete batch | | `sampler` | Custom sampling strategy | ### **Iterating Through DataLoader** ```python # Training loop for epoch in range(10): for batch_idx, (data, target) in enumerate(train_loader): # data: [batch_size, channels, height, width] # target: [batch_size] # Move to GPU if available data, target = data.to(device), target.to(device) # Forward pass, backward pass, etc. optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step() ``` ### **Custom Collate Function** Sometimes you need custom batch assembly: ```python def custom_collate(batch): """Custom collate function for variable-length sequences""" data = [item[0] for item in batch] targets = [item[1] for item in batch] # Pad sequences to same length data = torch.nn.utils.rnn.pad_sequence(data, batch_first=True) targets = torch.tensor(targets) return data, targets loader = DataLoader(dataset, batch_size=32, collate_fn=custom_collate) ``` ### **Practical Example: CIFAR-10 Data Loading** Let's load CIFAR-10 with proper configuration: ```python import torch from torchvision import datasets, transforms # Define transformations (will cover in next section) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) # Load training set train_dataset = datasets.CIFAR10( root='./data', train=True, download=True, transform=transform ) # Load test set test_dataset = datasets.CIFAR10( root='./data', train=False, download=True, transform=transform ) # Create data loaders train_loader = DataLoader( train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True ) test_loader = DataLoader( test_dataset, batch_size=100, shuffle=False, num_workers=4, pin_memory=True ) # Verify print(f"Training batches: {len(train_loader)}") print(f"Test batches: {len(test_loader)}") print(f"Batch shape: {next(iter(train_loader))[0].shape}") # Should output: torch.Size([128, 3, 32, 32]) ``` > 💡 **Pro Tip**: For large datasets, set `num_workers` to the number of CPU cores available. But don't set it too high—each worker consumes memory. --- ## **Transforms: Image Preprocessing and Augmentation** Raw image data isn't ready for deep learning models. We need to **preprocess** it and often **augment** it to improve model robustness. ### **Why Transforms?** Images come in various formats, sizes, and color spaces. To feed them into a neural network, we need to: - Convert to tensors - Normalize pixel values - Resize to consistent dimensions - Augment to increase dataset size PyTorch's `torchvision.transforms` provides these capabilities. ### **Basic Image Transforms** Let's start with essential transforms: ```python from torchvision import transforms # Convert PIL Image or numpy array to tensor transform = transforms.ToTensor() # Example usage from PIL import Image img = Image.open('example.jpg') tensor_img = transform(img) # Shape: [C, H, W] ``` ### **Normalization** Deep learning models work best with normalized inputs. For images, we typically normalize to mean=0 and std=1. ```python # Normalize: (x - mean) / std transform = transforms.Normalize( mean=[0.485, 0.456, 0.406], # ImageNet stats std=[0.229, 0.224, 0.225] ) ``` > 💡 **Note**: For CIFAR-10, common normalization is: > ```python > transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) > ``` ### **Composing Multiple Transforms** Use `transforms.Compose` to chain multiple operations: ```python transform = transforms.Compose([ transforms.Resize((224, 224)), # Resize to 224x224 transforms.CenterCrop(200), # Crop to 200x200 transforms.ToTensor(), # Convert to tensor transforms.Normalize( # Normalize mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) ``` ### **Data Augmentation** Data augmentation artificially increases dataset size by applying random transformations. This prevents overfitting and improves generalization. #### **Common Augmentation Techniques** ```python train_transform = transforms.Compose([ transforms.RandomResizedCrop(224), # Random crop and resize transforms.RandomHorizontalFlip(), # 50% chance of flip transforms.ColorJitter( # Random color changes brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1 ), transforms.RandomRotation(10), # Random rotation up to 10° transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) # No augmentation for validation/test val_transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) ``` ### **Advanced Augmentation with Albumentations** For more complex augmentations, use the **Albumentations** library with PyTorch: ```python import albumentations as A from albumentations.pytorch import ToTensorV2 # Define augmentation pipeline transform = A.Compose([ A.HorizontalFlip(p=0.5), A.Rotate(limit=15, p=0.5), A.RandomBrightnessContrast(p=0.2), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2() ]) # Custom dataset using albumentations class AlbumentationsDataset(Dataset): def __init__(self, file_paths, labels, transform=None): self.file_paths = file_paths self.labels = labels self.transform = transform def __len__(self): return len(self.file_paths) def __getitem__(self, idx): image = cv2.imread(self.file_paths[idx]) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) if self.transform: augmented = self.transform(image=image) image = augmented['image'] return image, self.labels[idx] ``` ### **Visualizing Augmentations** Let's visualize how augmentations transform an image: ```python import matplotlib.pyplot as plt from torchvision.utils import make_grid # Get a batch of augmented images data, _ = next(iter(train_loader)) # Create grid of images img_grid = make_grid(data, nrow=8) img_grid = img_grid.permute(1, 2, 0) # CHW to HWC # Display plt.figure(figsize=(12, 6)) plt.imshow(img_grid) plt.title("Augmented CIFAR-10 Images") plt.axis('off') plt.show() ``` This will show a grid of images with different augmentations applied, demonstrating the diversity created by our augmentation pipeline. ### **Custom Transforms** Sometimes you need a transform that doesn't exist. Create your own! ```python class RandomCutout: """Randomly cut out a portion of the image""" def __init__(self, size=8, n_holes=1): self.size = size self.n_holes = n_holes def __call__(self, img): h, w = img.shape[1], img.shape[2] for _ in range(self.n_holes): y = torch.randint(0, h, (1,)).item() x = torch.randint(0, w, (1,)).item() y1 = max(0, y - self.size // 2) y2 = min(h, y + self.size // 2) x1 = max(0, x - self.size // 2) x2 = min(w, x + self.size // 2) img[:, y1:y2, x1:x2] = 0 return img # Usage transform = transforms.Compose([ transforms.ToTensor(), RandomCutout(size=8, n_holes=2), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]) ``` ### **Best Practices for Transforms** 1. **Train vs. Validation**: Apply augmentation only to training data 2. **Normalization**: Use dataset-specific mean/std values 3. **Order Matters**: Resize before cropping, normalize last 4. **GPU vs. CPU**: Do heavy augmentation on CPU, simple transforms on GPU 5. **Consistency**: For segmentation, use transforms that affect image and mask together > 💡 **Pro Tip**: For production, consider using `torchvision.transforms.AutoAugment` or `RandAugment` which automatically find optimal augmentation policies. --- ## **Convolutional Neural Networks (CNNs): Theory and Architecture** Now that we know how to handle image data, let's dive into **Convolutional Neural Networks (CNNs)**, the workhorse of computer vision. ### **Why CNNs for Images?** Traditional neural networks (like the one we built in Part 1) have limitations for images: - **Parameter explosion**: A 1000x1000 image would need 1M input neurons - **Spatial information loss**: Fully connected layers ignore pixel relationships - **Translation invariance**: Same object in different locations should be recognized CNNs solve these problems with: - **Local connectivity**: Neurons connect to local regions - **Weight sharing**: Same filter applied across image - **Hierarchical feature learning**: Simple to complex features ### **Core Components of CNNs** #### 1. **Convolutional Layer** The heart of a CNN. Applies filters (kernels) to detect features. - **Input**: Image tensor [C, H, W] - **Filter**: Small kernel (e.g., 3x3) with weights - **Operation**: Slide filter across image, compute dot product - **Output**: Feature map highlighting where filter matches Mathematically: ``` (feature_map)_i,j = sum_{m,n} input_{i+m,j+n} * filter_{m,n} ``` In PyTorch: ```python conv = nn.Conv2d( in_channels=3, # Input channels (RGB=3) out_channels=16, # Number of filters kernel_size=3, # 3x3 filter stride=1, # Step size padding=1 # Add padding to keep size ) output = conv(input_tensor) ``` #### 2. **Activation Function** Adds non-linearity. **ReLU** is most common: ```python relu = nn.ReLU() output = relu(conv_output) ``` Why ReLU? - Simple: max(0, x) - Avoids vanishing gradients - Computationally efficient #### 3. **Pooling Layer** Reduces spatial dimensions while keeping important features. **Max Pooling** (most common): - Takes maximum value in each window - Preserves strongest activations - Provides translation invariance ```python pool = nn.MaxPool2d(kernel_size=2, stride=2) output = pool(relu_output) ``` **Average Pooling**: - Takes average value in each window - Smoother feature maps - Less common than max pooling #### 4. **Batch Normalization** Stabilizes and accelerates training: ```python bn = nn.BatchNorm2d(num_features=16) output = bn(conv_output) ``` Benefits: - Reduces internal covariate shift - Allows higher learning rates - Acts as regularizer #### 5. **Dropout** Regularization technique to prevent overfitting: ```python dropout = nn.Dropout(p=0.5) output = dropout(feature_map) ``` ### **CNN Architecture Patterns** Most CNNs follow this pattern: ``` INPUT -> [[CONV -> RELU] * N -> POOL] * M -> [FC -> RELU] * K -> FC ``` Where: - N = number of conv layers before pooling (usually 1-3) - M = number of convolutional blocks (usually 3-5) - K = number of fully connected layers (usually 0-2) ### **Classic CNN Architectures** #### LeNet-5 (1998) One of the first successful CNNs (for handwritten digits): ``` INPUT(32x32) -> C1(6@28x28) -> S2 -> C3(16@10x10) -> S4 -> C5 -> F6 -> OUTPUT ``` - C = Convolutional layer - S = Subsampling (pooling) - Numbers indicate feature map count #### AlexNet (2012) Revolutionized computer vision (won ImageNet challenge): ``` INPUT(227x227) -> CONV(96,11x11,s=4) -> RELU -> POOL -> CONV(256,5x5) -> RELU -> POOL -> CONV(384,3x3) -> RELU -> CONV(384,3x3) -> RELU -> CONV(256,3x3) -> RELU -> POOL -> FC(4096) -> RELU -> DROPOUT -> FC(4096) -> RELU -> DROPOUT -> FC(1000) -> SOFTMAX ``` Key innovations: - ReLU activation - Overlapping pooling - Local response normalization - Data augmentation - Dropout regularization - GPU implementation #### VGGNet (2014) Simplified AlexNet with consistent 3x3 convolutions: ``` INPUT -> [CONV(64) -> CONV(64) -> POOL] -> [CONV(128) -> CONV(128) -> POOL] -> [CONV(256) -> CONV(256) -> CONV(256) -> POOL] -> [CONV(512) -> CONV(512) -> CONV(512) -> POOL] -> [CONV(512) -> CONV(512) -> CONV(512) -> POOL] -> FC(4096) -> FC(4096) -> FC(1000) -> SOFTMAX ``` Two main variants: - VGG-16 (16 weight layers) - VGG-19 (19 weight layers) #### ResNet (2015) Introduced **residual connections** to solve vanishing gradients in deep networks: ``` INPUT -> CONV -> BN -> RELU -> [CONV -> BN -> RELU -> CONV -> BN] + INPUT -> RELU -> [CONV -> BN -> RELU -> CONV -> BN] + INPUT -> RELU -> ... ``` Key innovation: **Skip connections** that allow gradients to flow directly through the network. ResNet-50 (50 layers) became the new standard. #### EfficientNet (2019) Balanced network depth, width, and resolution: ``` INPUT -> STAGE1 -> STAGE2 -> ... -> STAGE8 -> POOL -> FC ``` Used **compound scaling** to uniformly scale all dimensions. ### **Visualizing CNN Features** Let's see what CNNs learn: - **Early layers**: Detect edges, colors, textures - **Middle layers**: Detect shapes, patterns - **Late layers**: Detect complex objects, parts ![CNN Feature Visualization](https://cdn-images-1.medium.com/max/1600/1*V_EUgNXIl4Xitpu5LcwbSA.png) *Visualization of features learned at different CNN layers (Source: CS231n)* ### **How Convolution Works: Step-by-Step** Let's walk through a simple 2D convolution: **Input** (3x3 image): ``` 1 2 3 4 5 6 7 8 9 ``` **Filter** (2x2): ``` 1 0 0 -1 ``` **Stride = 1, No Padding** Step 1: Position filter at top-left ``` [1 2] 3 [4 5] 6 7 8 9 ``` Dot product: (1*1) + (2*0) + (4*0) + (5*-1) = 1 + 0 + 0 - 5 = -4 Step 2: Slide right ``` 1 [2 3] 4 [5 6] 7 8 9 ``` Dot product: (2*1) + (3*0) + (5*0) + (6*-1) = 2 + 0 + 0 - 6 = -4 Step 3: Move down, reset left ``` 1 2 3 [4 5] 6 [7 8] 9 ``` Dot product: (4*1) + (5*0) + (7*0) + (8*-1) = 4 + 0 + 0 - 8 = -4 Step 4: Slide right ``` 1 2 3 4 [5 6] 7 [8 9] ``` Dot product: (5*1) + (6*0) + (8*0) + (9*-1) = 5 + 0 + 0 - 9 = -4 **Output Feature Map** (2x2): ``` -4 -4 -4 -4 ``` This simple example shows how a filter can detect specific patterns (in this case, a diagonal edge detector). ### **Practical Considerations for CNN Design** 1. **Filter Size**: 3x3 is standard (good balance of coverage and parameters) 2. **Number of Filters**: Typically doubles after each pooling (64→128→256→512) 3. **Padding**: Use "same" padding (padding=1 for 3x3) to maintain spatial dimensions 4. **Stride**: Usually 1 for conv, 2 for pooling 5. **Depth**: Deeper networks learn more complex features but harder to train 6. **Normalization**: Always use batch norm after conv layers 7. **Activation**: ReLU is standard; leaky ReLU for GANs --- ## **Building Your First CNN for Image Classification** Now that we understand CNN theory, let's build one in PyTorch! ### **Step 1: Define the CNN Architecture** We'll create a simple CNN for CIFAR-10: ```python import torch import torch.nn as nn import torch.nn.functional as F class SimpleCNN(nn.Module): def __init__(self, num_classes=10): super(SimpleCNN, self).__init__() # First convolutional block self.conv1 = nn.Conv2d( in_channels=3, out_channels=32, kernel_size=3, padding=1 ) self.bn1 = nn.BatchNorm2d(32) self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # Second convolutional block self.conv2 = nn.Conv2d( in_channels=32, out_channels=64, kernel_size=3, padding=1 ) self.bn2 = nn.BatchNorm2d(64) self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) # Third convolutional block self.conv3 = nn.Conv2d( in_channels=64, out_channels=128, kernel_size=3, padding=1 ) self.bn3 = nn.BatchNorm2d(128) # Fully connected layers self.fc1 = nn.Linear(128 * 8 * 8, 512) self.dropout = nn.Dropout(0.5) self.fc2 = nn.Linear(512, num_classes) def forward(self, x): # Block 1 x = self.conv1(x) x = self.bn1(x) x = F.relu(x) x = self.pool1(x) # Block 2 x = self.conv2(x) x = self.bn2(x) x = F.relu(x) x = self.pool2(x) # Block 3 x = self.conv3(x) x = self.bn3(x) x = F.relu(x) # Flatten x = x.view(-1, 128 * 8 * 8) # Fully connected x = self.fc1(x) x = F.relu(x) x = self.dropout(x) x = self.fc2(x) return x ``` ### **Architecture Explanation** Let's trace the shape through the network (for CIFAR-10's 32x32 images): 1. **Input**: [batch, 3, 32, 32] 2. **Conv1 + ReLU + Pool1**: - Conv1: [batch, 32, 32, 32] (3x3 conv, padding=1) - Pool1: [batch, 32, 16, 16] (2x2 max pool, stride=2) 3. **Conv2 + ReLU + Pool2**: - Conv2: [batch, 64, 16, 16] - Pool2: [batch, 64, 8, 8] 4. **Conv3 + ReLU**: - Conv3: [batch, 128, 8, 8] 5. **Flatten**: [batch, 128*8*8] = [batch, 8192] 6. **FC1**: [batch, 512] 7. **FC2**: [batch, 10] (10 classes for CIFAR-10) ### **Step 2: Initialize and Move to Device** ```python # Create model model = SimpleCNN(num_classes=10) # Move to GPU if available device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = model.to(device) # Print model summary (requires torchsummary) from torchsummary import summary summary(model, input_size=(3, 32, 32)) ``` The summary will show: - Each layer's output shape - Number of parameters - Total parameters (~500K for this model) ### **Step 3: Define Loss Function and Optimizer** ```python # Cross-entropy loss for classification criterion = nn.CrossEntropyLoss() # Adam optimizer with weight decay optimizer = torch.optim.Adam( model.parameters(), lr=0.001, weight_decay=1e-5 # L2 regularization ) # Learning rate scheduler scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='max', factor=0.5, patience=5, verbose=True ) ``` ### **Step 4: Training and Validation Functions** ```python def train_epoch(model, dataloader, criterion, optimizer, device): model.train() running_loss = 0.0 correct = 0 total = 0 for inputs, targets in dataloader: inputs, targets = inputs.to(device), targets.to(device) # Forward pass optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) # Backward pass loss.backward() optimizer.step() # Statistics running_loss += loss.item() * inputs.size(0) _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() epoch_loss = running_loss / total epoch_acc = correct / total return epoch_loss, epoch_acc def validate(model, dataloader, criterion, device): model.eval() running_loss = 0.0 correct = 0 total = 0 with torch.no_grad(): for inputs, targets in dataloader: inputs, targets = inputs.to(device), targets.to(device) outputs = model(inputs) loss = criterion(outputs, targets) running_loss += loss.item() * inputs.size(0) _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() epoch_loss = running_loss / total epoch_acc = correct / total return epoch_loss, epoch_acc ``` ### **Step 5: Full Training Loop** ```python # Training configuration num_epochs = 50 best_acc = 0.0 history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []} for epoch in range(num_epochs): # Train train_loss, train_acc = train_epoch( model, train_loader, criterion, optimizer, device ) # Validate val_loss, val_acc = validate(model, test_loader, criterion, device) # Learning rate scheduling scheduler.step(val_acc) # Save best model if val_acc > best_acc: best_acc = val_acc torch.save(model.state_dict(), 'best_model.pth') # Record history history['train_loss'].append(train_loss) history['train_acc'].append(train_acc) history['val_loss'].append(val_loss) history['val_acc'].append(val_acc) # Print progress print(f"Epoch {epoch+1}/{num_epochs}") print(f"Train Loss: {train_loss:.4f} Acc: {train_acc:.4f}") print(f"Val Loss: {val_loss:.4f} Acc: {val_acc:.4f} (Best: {best_acc:.4f})") print("-" * 60) ``` ### **Step 6: Visualize Training History** ```python import matplotlib.pyplot as plt plt.figure(figsize=(12, 5)) # Plot loss plt.subplot(1, 2, 1) plt.plot(history['train_loss'], label='Train Loss') plt.plot(history['val_loss'], label='Val Loss') plt.title('Loss') plt.xlabel('Epoch') plt.legend() # Plot accuracy plt.subplot(1, 2, 2) plt.plot(history['train_acc'], label='Train Acc') plt.plot(history['val_acc'], label='Val Acc') plt.title('Accuracy') plt.xlabel('Epoch') plt.legend() plt.tight_layout() plt.show() ``` ### **Step 7: Evaluate on Test Set** ```python # Load best model model.load_state_dict(torch.load('best_model.pth')) # Final evaluation test_loss, test_acc = validate(model, test_loader, criterion, device) print(f"Test Accuracy: {test_acc:.4f}") # Confusion matrix from sklearn.metrics import confusion_matrix import seaborn as sns model.eval() all_preds = [] all_targets = [] with torch.no_grad(): for inputs, targets in test_loader: inputs = inputs.to(device) outputs = model(inputs) _, preds = torch.max(outputs, 1) all_preds.extend(preds.cpu().numpy()) all_targets.extend(targets.numpy()) cm = confusion_matrix(all_targets, all_preds) plt.figure(figsize=(10, 8)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') plt.xlabel('Predicted') plt.ylabel('True') plt.title('Confusion Matrix') plt.show() ``` ### **Step 8: Making Predictions on New Images** ```python def predict_image(image_path, model, transform, device, class_names): """Predict class for a single image""" # Load and transform image img = Image.open(image_path) img_tensor = transform(img).unsqueeze(0) # Add batch dimension img_tensor = img_tensor.to(device) # Predict model.eval() with torch.no_grad(): output = model(img_tensor) _, pred = torch.max(output, 1) return class_names[pred.item()] # Example usage class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # Assuming you have a test image prediction = predict_image('test_image.jpg', model, val_transform, device, class_names) print(f"Predicted class: {prediction}") ``` ### **Common CNN Pitfalls and How to Avoid Them** 1. **Overfitting** - *Symptoms*: High training accuracy, low validation accuracy - *Solutions*: - Add dropout layers - Increase data augmentation - Use weight decay - Reduce model complexity 2. **Underfitting** - *Symptoms*: Low training and validation accuracy - *Solutions*: - Increase model capacity (more layers/filters) - Train longer - Reduce regularization - Check data preprocessing 3. **Vanishing Gradients** - *Symptoms*: Very slow training, gradients approaching zero - *Solutions*: - Use batch normalization - Use skip connections (ResNet-style) - Use ReLU or leaky ReLU - Reduce network depth 4. **Exploding Gradients** - *Symptoms*: NaN losses, sudden performance drops - *Solutions*: - Gradient clipping - Lower learning rate - Proper weight initialization 5. **Class Imbalance** - *Symptoms*: Model biased toward majority classes - *Solutions*: - Class weights in loss function - Oversampling minority classes - Undersampling majority classes --- ## **Training a CNN on CIFAR-10 Dataset** Let's put everything together and train a CNN on the **CIFAR-10 dataset**. ### **What is CIFAR-10?** CIFAR-10 is a benchmark dataset in computer vision containing: - 60,000 32x32 color images - 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck - 50,000 training images, 10,000 test images - Classes are balanced (6,000 images per class) ![CIFAR-10 Examples](https://www.cs.toronto.edu/~kriz/cifar-10.png) ### **Complete CIFAR-10 Training Script** Here's a complete script to train a CNN on CIFAR-10: ```python import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets, transforms import matplotlib.pyplot as plt import time import os # Configuration config = { 'batch_size': 128, 'num_epochs': 50, 'learning_rate': 0.001, 'weight_decay': 1e-5, 'device': torch.device('cuda' if torch.cuda.is_available() else 'cpu'), 'model_save_path': 'cifar10_model.pth' } print(f"Using device: {config['device']}") # Data augmentation and normalization train_transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) test_transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) # Load datasets train_dataset = datasets.CIFAR10( root='./data', train=True, download=True, transform=train_transform ) test_dataset = datasets.CIFAR10( root='./data', train=False, download=True, transform=test_transform ) # Create data loaders train_loader = DataLoader( train_dataset, batch_size=config['batch_size'], shuffle=True, num_workers=4, pin_memory=True ) test_loader = DataLoader( test_dataset, batch_size=config['batch_size'], shuffle=False, num_workers=4, pin_memory=True ) # Define the CNN model class CIFAR10CNN(nn.Module): def __init__(self, num_classes=10): super(CIFAR10CNN, self).__init__() self.features = nn.Sequential( # Block 1 nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), nn.Dropout(0.25), # Block 2 nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), nn.Dropout(0.25), # Block 3 nn.Conv2d(128, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), nn.Dropout(0.25), ) self.classifier = nn.Sequential( nn.Linear(256 * 4 * 4, 512), nn.ReLU(inplace=True), nn.Dropout(0.5), nn.Linear(512, num_classes) ) def forward(self, x): x = self.features(x) x = x.view(x.size(0), -1) x = self.classifier(x) return x # Initialize model model = CIFAR10CNN(num_classes=10).to(config['device']) # Loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam( model.parameters(), lr=config['learning_rate'], weight_decay=config['weight_decay'] ) scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='max', factor=0.5, patience=5, verbose=True ) # Training function def train(model, train_loader, criterion, optimizer, device): model.train() running_loss = 0.0 correct = 0 total = 0 for inputs, targets in train_loader: inputs, targets = inputs.to(device), targets.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() running_loss += loss.item() * inputs.size(0) _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() epoch_loss = running_loss / total epoch_acc = correct / total return epoch_loss, epoch_acc # Validation function def validate(model, test_loader, criterion, device): model.eval() running_loss = 0.0 correct = 0 total = 0 with torch.no_grad(): for inputs, targets in test_loader: inputs, targets = inputs.to(device), targets.to(device) outputs = model(inputs) loss = criterion(outputs, targets) running_loss += loss.item() * inputs.size(0) _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() epoch_loss = running_loss / total epoch_acc = correct / total return epoch_loss, epoch_acc # Training loop best_acc = 0.0 train_losses, train_accs = [], [] val_losses, val_accs = [], [] start_time = time.time() for epoch in range(config['num_epochs']): # Train train_loss, train_acc = train(model, train_loader, criterion, optimizer, config['device']) # Validate val_loss, val_acc = validate(model, test_loader, criterion, config['device']) # Update learning rate scheduler.step(val_acc) # Save best model if val_acc > best_acc: best_acc = val_acc torch.save(model.state_dict(), config['model_save_path']) # Record metrics train_losses.append(train_loss) train_accs.append(train_acc) val_losses.append(val_loss) val_accs.append(val_acc) # Print progress print(f"Epoch {epoch+1:2d}/{config['num_epochs']} | " f"Time: {time.time()-start_time:.1f}s | " f"Train Loss: {train_loss:.4f} Acc: {train_acc:.4f} | " f"Val Loss: {val_loss:.4f} Acc: {val_acc:.4f} (Best: {best_acc:.4f})") # Save final metrics metrics = { 'train_loss': train_losses, 'train_acc': train_accs, 'val_loss': val_losses, 'val_acc': val_accs } torch.save(metrics, 'training_metrics.pth') # Plot results plt.figure(figsize=(14, 5)) # Loss plot plt.subplot(1, 2, 1) plt.plot(train_losses, label='Train Loss') plt.plot(val_losses, label='Validation Loss') plt.title('Loss Curve') plt.xlabel('Epoch') plt.ylabel('Loss') plt.legend() # Accuracy plot plt.subplot(1, 2, 2) plt.plot(train_accs, label='Train Accuracy') plt.plot(val_accs, label='Validation Accuracy') plt.title('Accuracy Curve') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() plt.tight_layout() plt.savefig('training_curves.png') plt.show() # Final evaluation print(f"\nTraining completed in {time.time()-start_time:.0f} seconds") print(f"Best validation accuracy: {best_acc:.4f}") # Load best model for final evaluation model.load_state_dict(torch.load(config['model_save_path'])) final_loss, final_acc = validate(model, test_loader, criterion, config['device']) print(f"Final test accuracy: {final_acc:.4f}") ``` ### **Expected Results** With this setup, you should achieve: - **~85-88% test accuracy** after 50 epochs - Training time: ~30-45 minutes on a modern GPU - Clear convergence with no severe overfitting ### **Improving Performance** To get even better results: 1. **More complex architecture**: - Add more convolutional blocks - Use residual connections - Try different activation functions 2. **Advanced augmentation**: ```python train_transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.RandomRotation(15), transforms.ColorJitter(brightness=0.2, contrast=0.2), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), transforms.RandomErasing(p=0.5, scale=(0.02, 0.2), ratio=(0.3, 3.3)) ]) ``` 3. **Learning rate scheduling**: ```python scheduler = optim.lr_scheduler.OneCycleLR( optimizer, max_lr=0.01, steps_per_epoch=len(train_loader), epochs=config['num_epochs'] ) ``` 4. **Mixup augmentation**: ```python def mixup_data(x, y, alpha=1.0): if alpha > 0: lam = np.random.beta(alpha, alpha) else: lam = 1 batch_size = x.size()[0] index = torch.randperm(batch_size).to(x.device) mixed_x = lam * x + (1 - lam) * x[index, :] y_a, y_b = y, y[index] return mixed_x, y_a, y_b, lam # In training loop inputs, targets = inputs.to(device), targets.to(device) inputs, targets_a, targets_b, lam = mixup_data(inputs, targets) outputs = model(inputs) loss = lam * criterion(outputs, targets_a) + (1 - lam) * criterion(outputs, targets_b) ``` 5. **Label smoothing**: ```python class LabelSmoothingLoss(nn.Module): def __init__(self, classes, smoothing=0.1): super(LabelSmoothingLoss, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing self.cls = classes def forward(self, pred, target): pred = pred.log_softmax(dim=-1) with torch.no_grad(): true_dist = torch.zeros_like(pred) true_dist.fill_(self.smoothing / (self.cls - 1)) true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence) return torch.mean(torch.sum(-true_dist * pred, dim=-1)) criterion = LabelSmoothingLoss(classes=10, smoothing=0.1) ``` With these enhancements, you can reach **~92-93% test accuracy** on CIFAR-10, which is quite good for a model trained from scratch. --- ## **Transfer Learning: Leveraging Pretrained Models** Training CNNs from scratch requires massive datasets and compute resources. **Transfer learning** solves this by using models pretrained on large datasets like ImageNet. ### **What is Transfer Learning?** Transfer learning takes a model trained on one task and adapts it to a new task. In computer vision, this means: 1. Take a model pretrained on ImageNet (1.2M images, 1000 classes) 2. Replace the final classification layer 3. Fine-tune on your target dataset Benefits: - **Faster training**: Start with good feature extractors - **Better performance**: Especially with small datasets - **Less data needed**: Works with hundreds instead of thousands of images ### **How Transfer Learning Works** 1. **Feature extraction**: Use pretrained model as fixed feature extractor - Freeze all layers except the classifier - Train only the new classifier head 2. **Fine-tuning**: Update some pretrained layers - Unfreeze some layers - Train with lower learning rate ![Transfer Learning](https://miro.medium.com/max/1400/1*VEv1VzBAYySECIBh3pWm9A.png) *Transfer learning approaches (Source: PyTorch documentation)* ### **PyTorch Models Zoo** PyTorch provides many pretrained models through **torchvision.models**: ```python from torchvision import models # List available models print(dir(models)) # Load pretrained ResNet-18 model = models.resnet18(pretrained=True) # Load pretrained EfficientNet-B0 model = models.efficientnet_b0(pretrained=True) # Load pretrained Vision Transformer model = models.vit_b_16(pretrained=True) ``` Available models include: - **ResNet**: resnet18, resnet34, resnet50, resnet101, resnet152 - **EfficientNet**: efficientnet_b0 to efficientnet_b7 - **Vision Transformers**: vit_b_16, vit_b_32, vit_l_16 - **MobileNet**: mobilenet_v2, mobilenet_v3_small, mobilenet_v3_large - **DenseNet**: densenet121, densenet161, densenet169, densenet201 - **AlexNet, VGG, GoogLeNet, Inception** ### **Step-by-Step: Transfer Learning on CIFAR-10** Let's use ResNet-18 for CIFAR-10: ```python import torch import torch.nn as nn import torch.optim as optim from torchvision import models, datasets, transforms # Configuration device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') num_classes = 10 batch_size = 128 num_epochs = 20 feature_extract = True # True for feature extraction, False for fine-tuning # Data transforms train_transform = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) test_transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) # Load datasets train_dataset = datasets.CIFAR10('./data', train=True, download=True, transform=train_transform) test_dataset = datasets.CIFAR10('./data', train=False, download=True, transform=test_transform) train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4) test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4) # Load pretrained model model = models.resnet18(pretrained=True) # Freeze all parameters (feature extraction mode) if feature_extract: for param in model.parameters(): param.requires_grad = False # Replace the final fully connected layer num_ftrs = model.fc.in_features model.fc = nn.Linear(num_ftrs, num_classes) # Send model to device model = model.to(device) # Gather parameters to optimize params_to_update = model.parameters() if feature_extract: params_to_update = [] for name, param in model.named_parameters(): if param.requires_grad == True: params_to_update.append(param) print(f"Updating {name}") # Setup optimizer and loss function optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9) criterion = nn.CrossEntropyLoss() # Training loop (same as before) # ... [same training/validation functions as in previous section] ... ``` ### **Feature Extraction vs. Fine-tuning** #### **Feature Extraction (feature_extract=True)** - **How it works**: Freeze all pretrained layers, train only new classifier - **When to use**: Small dataset, similar to ImageNet - **Advantages**: - Very fast training - No risk of overwriting good features - **Disadvantages**: - May not adapt well to very different tasks #### **Fine-tuning (feature_extract=False)** - **How it works**: Unfreeze some layers, train with low learning rate - **When to use**: Larger dataset, somewhat different from ImageNet - **Advantages**: - Better adaptation to target task - Higher potential accuracy - **Disadvantages**: - Slower training - Risk of overfitting ### **Advanced Fine-tuning Strategies** #### **Layer-wise Learning Rates** Different layers may need different learning rates: ```python # Group parameters by layer optimizer_grouped_parameters = [ {'params': model.conv1.parameters(), 'lr': 1e-5}, {'params': model.layer1.parameters(), 'lr': 5e-5}, {'params': model.layer2.parameters(), 'lr': 1e-4}, {'params': model.layer3.parameters(), 'lr': 5e-4}, {'params': model.layer4.parameters(), 'lr': 1e-3}, {'params': model.fc.parameters(), 'lr': 1e-2}, ] optimizer = optim.Adam(optimizer_grouped_parameters) ``` #### **Gradual Unfreezing** Unfreeze layers progressively during training: ```python def unfreeze_layers(model, num_layers): """Unfreeze the last num_layers of the model""" # ResNet-specific; adapt for other architectures layers = [model.layer4, model.layer3, model.layer2, model.layer1] for i in range(min(num_layers, len(layers))): for param in layers[i].parameters(): param.requires_grad = True # Start with only classifier trainable for epoch in range(num_epochs): if epoch == 5: unfreeze_layers(model, 1) # Unfreeze last block optimizer = optim.SGD(model.parameters(), lr=1e-4) elif epoch == 10: unfreeze_layers(model, 2) # Unfreeze second last block optimizer = optim.SGD(model.parameters(), lr=1e-5) ``` #### **Discriminative Fine-tuning** Used in Universal Language Models (ULMFiT), also works for vision: ```python # Learning rates decrease by factor for earlier layers base_lr = 1e-3 layers = [ model.fc, model.layer4, model.layer3, model.layer2, model.layer1, model.conv1 ] params = [] for i, layer in enumerate(layers): params.append({'params': layer.parameters(), 'lr': base_lr / (2 ** i)}) optimizer = optim.Adam(params) ``` ### **Expected Results with Transfer Learning** With ResNet-18 on CIFAR-10: - **Feature extraction**: ~88-90% test accuracy - **Fine-tuning**: ~92-94% test accuracy This is significantly better than training from scratch (~85-88%), with less training time. ### **When NOT to Use Transfer Learning** Transfer learning isn't always the best approach: 1. **Very different domains**: - Medical images vs. natural images - Satellite imagery vs. everyday photos 2. **Extremely large target dataset**: - If you have millions of task-specific images - Training from scratch may yield better results 3. **Specialized architectures**: - Some tasks need custom architectures - Example: Object detection, segmentation In these cases, consider: - **Domain-specific pretraining**: Pretrain on similar domain - **Multi-task learning**: Train on multiple related tasks - **Self-supervised learning**: Train without labels (e.g., SimCLR, MoCo) --- ## **Advanced Debugging and Profiling Techniques** Even with PyTorch's great debugging tools, deep learning models can be tricky to debug. Here are advanced techniques to identify and fix issues. ### **1. Verifying Data Pipeline** Most bugs come from data issues, not model issues. #### **Check Data Distribution** ```python # Check class distribution from collections import Counter train_labels = [label for _, label in train_dataset] test_labels = [label for _, label in test_dataset] print("Train distribution:", Counter(train_labels)) print("Test distribution:", Counter(test_labels)) # Should be balanced (5000 per class for CIFAR-10 train) ``` #### **Visualize Raw and Transformed Images** ```python def show_batch(sample_batched, title=None): """Show image batch""" images, labels = sample_batched batch_size = len(images) im_size = images.size(2) grid = make_grid(images) plt.imshow(grid.numpy().transpose((1, 2, 0))) if title is not None: plt.title(title) plt.axis('off') plt.show() # Get a batch of training data sample_batch = next(iter(train_loader)) show_batch(sample_batch, 'Training Batch') # Get a batch of test data sample_batch = next(iter(test_loader)) show_batch(sample_batch, 'Test Batch') ``` #### **Check for Data Leakage** ```python # Are training and test sets properly separated? train_paths = set(train_dataset.imgs) test_paths = set(test_dataset.imgs) common = train_paths & test_paths print(f"Common files between train and test: {len(common)}") # Should be zero! ``` ### **2. Gradient Checking** Verify your gradients are correct: ```python from torch.autograd import gradcheck # Test with small input input = (torch.randn(20, 3, 32, 32, dtype=torch.double, requires_grad=True),) test = gradcheck(model, input, eps=1e-6, atol=1e-4) print(f"Gradient check passed: {test}") ``` ### **3. Numerical Stability Checks** Watch for NaNs and infinities: ```python def check_tensor(tensor, name="tensor"): """Check for NaNs and Infs""" if torch.isnan(tensor).any(): print(f"Warning: {name} contains NaNs") if torch.isinf(tensor).any(): print(f"Warning: {name} contains Infs") # In training loop output = model(data) check_tensor(output, "model output") loss = criterion(output, target) check_tensor(loss, "loss") ``` ### **4. Learning Rate Scheduling Analysis** Visualize how learning rate changes: ```python lrs = [] for epoch in range(50): optimizer.step() lrs.append(optimizer.param_groups[0]['lr']) scheduler.step() plt.plot(lrs) plt.title('Learning Rate Schedule') plt.xlabel('Epoch') plt.ylabel('Learning Rate') plt.show() ``` ### **5. Weight and Activation Monitoring** Track distributions of weights and activations: ```python def add_hist_hooks(model): """Add hooks to record weight and activation histograms""" histograms = {} def hook_fn(name): def hook(module, input, output): histograms[f"{name}.output"] = output.detach() if hasattr(module, 'weight'): histograms[f"{name}.weight"] = module.weight.detach() if hasattr(module, 'bias') and module.bias is not None: histograms[f"{name}.bias"] = module.bias.detach() return hook for name, module in model.named_modules(): module.register_forward_hook(hook_fn(name)) return histograms # Usage histograms = add_hist_hooks(model) # During training, collect histograms periodically all_histograms = [] for epoch in range(0, num_epochs, 5): # Train for some steps # ... # Record histograms all_histograms.append({ k: v.cpu().numpy() for k, v in histograms.items() }) # Plot weight distribution over time plt.figure(figsize=(12, 8)) for i, hist in enumerate(all_histograms): plt.subplot(2, 3, i+1) plt.hist(hist['features.0.weight'].flatten(), bins=50) plt.title(f'Epoch {i*5}') plt.tight_layout() plt.show() ``` ### **6. TensorBoard for Advanced Monitoring** Beyond basic loss/accuracy, track: ```python from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter('runs/cifar10_experiment') # Track weight histograms for name, param in model.named_parameters(): writer.add_histogram(name, param, epoch) # Track gradient norms total_norm = 0 for p in model.parameters(): if p.grad is not None: param_norm = p.grad.data.norm(2) total_norm += param_norm.item() ** 2 total_norm = total_norm ** (1. / 2) writer.add_scalar('grad_norm', total_norm, epoch) # Track learning rate writer.add_scalar('learning_rate', optimizer.param_groups[0]['lr'], epoch) # Track misclassified examples if epoch % 10 == 0: # Get some misclassified examples misclassified = [] with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) pred = output.argmax(dim=1, keepdim=True) wrong = pred.ne(target.view_as(pred)) if wrong.any(): misclassified.append(( data[wrong.squeeze()], target[wrong.squeeze()], pred[wrong.squeeze()] )) if len(misclassified) > 5: break # Add to TensorBoard if misclassified: images, labels, preds = zip(*misclassified) images = torch.cat(images)[:16] labels = torch.cat(labels)[:16] preds = torch.cat(preds)[:16] grid = make_grid(images, nrow=4) writer.add_image( 'Misclassified Examples', grid, epoch, dataformats='CHW' ) # Add text descriptions class_names = train_dataset.classes text = "\n".join([ f"True: {class_names[labels[i]]}, Pred: {class_names[preds[i][0]]}" for i in range(len(labels)) ]) writer.add_text('Misclassified Labels', text, epoch) ``` ### **7. Profiling with PyTorch Profiler** Identify bottlenecks in your code: ```python from torch.profiler import profile, record_function, ProfilerActivity with profile( activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], schedule=torch.profiler.schedule( wait=1, warmup=1, active=3, repeat=2 ), on_trace_ready=torch.profiler.tensorboard_trace_handler('./log'), record_shapes=True, profile_memory=True, with_stack=True ) as prof: for step, (inputs, targets) in enumerate(train_loader): if step >= (1 + 1 + 3) * 2: break inputs = inputs.to(device) targets = targets.to(device) with record_function("forward"): outputs = model(inputs) loss = criterion(outputs, targets) with record_function("backward"): optimizer.zero_grad() loss.backward() optimizer.step() prof.step() # Print profiling results print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10)) ``` This will show: - Time spent in each operation - CPU vs. GPU time - Memory usage - Call stack for bottlenecks ### **8. Common Issues and Solutions** #### **Issue: Training loss doesn't decrease** - **Check**: Learning rate too low - **Fix**: Increase learning rate, use learning rate finder #### **Issue: Validation loss higher than training loss** - **Check**: Overfitting - **Fix**: Add dropout, reduce model size, increase augmentation #### **Issue: NaN loss** - **Check**: Exploding gradients, invalid inputs - **Fix**: Gradient clipping, check data preprocessing #### **Issue: Slow training** - **Check**: Data loading bottleneck - **Fix**: Increase `num_workers`, use pinned memory #### **Issue: Poor generalization** - **Check**: Data leakage, insufficient augmentation - **Fix**: Verify data split, add more augmentation --- ## **Quiz 2: Test Your Understanding of Computer Vision with PyTorch** **1. What is the primary purpose of the DataLoader class in PyTorch?** A) To define the neural network architecture B) To efficiently load and preprocess data in batches C) To calculate loss functions D) To implement backpropagation **2. Which transform is typically NOT applied during validation/testing?** A) Resize B) CenterCrop C) RandomHorizontalFlip D) Normalize **3. In a convolutional layer with in_channels=3, out_channels=16, and kernel_size=3, how many parameters does the layer have (ignoring bias)?** A) 432 B) 144 C) 160 D) 192 **4. What is the main benefit of using batch normalization in CNNs?** A) Reduces the number of parameters B) Makes training more stable and allows higher learning rates C) Replaces the need for activation functions D) Automatically selects the best kernel size **5. In transfer learning, what does "feature extraction" mode typically involve?** A) Training all layers with a high learning rate B) Freezing all pretrained layers and only training the new classifier C) Randomly initializing all weights D) Using only the first convolutional layer **6. Which of the following is NOT a common data augmentation technique for images?** A) RandomHorizontalFlip B) ColorJitter C) RandomErasing D) BatchNormalization **7. What does the "padding" parameter in a convolutional layer control?** A) The stride of the filter B) The number of output channels C) Whether to add zeros around the input to maintain spatial dimensions D) The activation function used after convolution **8. In the CIFAR-10 dataset, what do the dimensions of a single image tensor represent?** A) [height, width, channels] B) [channels, height, width] C) [batch, height, width, channels] D) [height, channels, width] **9. What is the primary purpose of the torchsummary library?** A) To visualize training curves B) To provide a summary of model architecture and parameters C) To implement data augmentation D) To convert models to ONNX format **10. When using transfer learning with ResNet on CIFAR-10, why do we need to replace the final fully connected layer?** A) To match the number of output classes (10 instead of 1000) B) To reduce the model size C) To enable data augmentation D) To improve numerical stability **11. Which technique would most directly address overfitting in a CNN?** A) Increasing the learning rate B) Adding dropout layers C) Removing batch normalization D) Reducing data augmentation **12. What is the main advantage of using mixed precision training (float16)?** A) Higher model accuracy B) Reduced memory usage and faster training on compatible GPUs C) Better handling of small gradients D) Automatic learning rate scheduling **13. In PyTorch, how do you move a model to GPU if available?** A) model.cuda() B) model.to('cuda') C) model.to(device) where device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') D) All of the above **14. What does the "num_workers" parameter in DataLoader control?** A) The batch size B) The number of subprocesses for data loading C) The learning rate D) The number of output classes **15. Which visualization technique would help identify if your model is overfitting?** A) Plotting training and validation loss curves B) Visualizing the model architecture C) Checking the number of parameters D) Monitoring GPU memory usage --- **Answers:** 1. B - DataLoader efficiently loads and preprocesses data in batches 2. C - RandomHorizontalFlip is augmentation, not used in validation 3. B - (3*3*3)*16 = 9*3*16 = 432 weights + 16 biases = 448 total, but ignoring bias: 432 4. B - Batch norm stabilizes training and allows higher learning rates 5. B - Feature extraction freezes pretrained layers and trains only the new classifier 6. D - BatchNormalization is a layer, not an augmentation technique 7. C - Padding adds zeros around input to maintain spatial dimensions 8. B - PyTorch uses [channels, height, width] format 9. B - torchsummary shows model architecture and parameter count 10. A - CIFAR-10 has 10 classes vs. ImageNet's 1000 11. B - Dropout is a direct regularization technique for overfitting 12. B - Mixed precision reduces memory and speeds up training on compatible GPUs 13. D - All methods move model to GPU (though C is most portable) 14. B - num_workers sets the number of subprocesses for data loading 15. A - Training/validation loss curves clearly show overfitting --- ## **Summary and What's Next in Part 3** In this **comprehensive Part 2** of our PyTorch Masterclass, we've covered: - **Dataset and DataLoader**: Efficient data handling for deep learning - **Transforms**: Image preprocessing and augmentation techniques - **CNN Architecture**: Theory behind convolutional neural networks - **Building CNNs**: Creating and training convolutional networks from scratch - **CIFAR-10 Training**: Complete workflow for image classification - **Transfer Learning**: Leveraging pretrained models for better performance - **Advanced Debugging**: Profiling, monitoring, and troubleshooting techniques You now have the skills to: - Build efficient data pipelines for image data - Design and train convolutional neural networks - Apply transfer learning to boost performance - Diagnose and fix common deep learning issues ### **What's Coming in Part 3?** In **Part 3**, we'll dive into **sequence modeling with Recurrent Neural Networks (RNNs)** and **Long Short-Term Memory (LSTM)** networks: - **Text data processing**: Tokenization, embeddings, and vocabulary creation - **RNN architecture**: Understanding hidden states and sequence processing - **LSTM and GRU**: Advanced recurrent units for long-term dependencies - **Building text classifiers**: Sentiment analysis with RNNs - **Sequence-to-sequence models**: Introduction to machine translation - **Attention mechanisms**: The foundation of modern NLP - **Transformer architecture**: Self-attention and positional encoding We'll build a **sentiment analysis model** on real text data and explore the architecture that powers models like BERT and GPT. 👉 **Stay tuned for Part 3: Deep Learning for Natural Language Processing with PyTorch** --- **Hashtags:** #PyTorch #ComputerVision #CNN #DeepLearning #TransferLearning #CIFAR10 #ImageClassification #DataLoaders #Transforms #ResNet #EfficientNet #PyTorchVision #AI #MachineLearning #ConvolutionalNeuralNetworks #DataAugmentation #PretrainedModels #TensorBoard #Debugging #Profiling #BatchNormalization #MixedPrecision #DataParallel #DistributedTraining #ImageNet #AlexNet #VGG #ResNet #MobileNet #VisionTransformer #PyTorchTutorial #DeepLearningCourse #AIEngineering #ComputerVisionEngineer