# AI Basic Concept ### CNN (Convolution Neural Network) 1. **Optimization** **梯度下降控制 (Gradient Descent Control)** 梯度下降是一種用於最小化損失函數的優化演算法,透過反覆更新模型參數來減少誤差。然而,如果調整不當,訓練損失可能會停滯不前。 - **關鍵點 (Critical Points)**:如果損失函數的梯度為零,梯度下降將無法繼續更新參數,導致訓練停滯。 - 主要類型的關鍵點: - **局部最小值 (Local Minima)**:梯度為零,但損失函數並未達到全域最小值。 - **鞍點 (Saddle Points)**:在某些維度上梯度為正,在其他維度上梯度為負,使優化變得困難。 ```python= optimizer = optim.Adam(model.parameters(), lr=0.001) ``` 2. **Batch** 為提高計算效率,訓練數據被分成多個 批次 (Batch)。 - 批次大小 (Batch Size):定義在更新模型權重前處理的樣本數。 - Epoch (訓練週期):對整個數據集進行一次完整訓練,每個 epoch 由多個 batch 組成。 - 批次大小的影響: - 小批次:每個 epoch 需要較長時間,但更具隨機性,可能有助於泛化能力。 - 大批次:每個 epoch 訓練較快,但可能導致泛化能力下降。 ```python= train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2) val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=2) ``` 3. **Momentum** 動量技術透過考慮過去的梯度來幫助逃離鞍點和局部最小值。 - 有可能可以對抗 saddle point 或 local minima 的技術 -> 往 gradient descent 的反方向再加一步去移動參數 4. **Learning rate** 學習率決定每次更新權重的幅度。 - 自適應學習率策略: - 如果梯度較小 (損失曲面較平坦),應增大學習率。 - 如果梯度較大 (損失曲面陡峭),應減小學習率。 ```python= lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1) ``` 5. **Optimization** - Adam - RMSProp:根據過去的梯度平方值調整每個參數的學習率。 - Momentum:累積過去的梯度,使更新更穩定。 ```python= optimizer = optim.Adam(model.parameters(), lr=0.001) ``` 6. **Loss** 損失函數衡量預測值與實際值之間的差異。 - Sigmoid 交叉熵損失 (Sigmoid Cross-Entropy Loss):用於二元分類。 - Softmax 交叉熵損失 (Softmax Cross-Entropy Loss):用於多類別分類。 ```python= criterion = nn.CrossEntropyLoss() ``` 7. **Weight Initialization** 如果初始化不當,可能會導致梯度消失或梯度爆炸問題: - Xavier Initialization (用於 Sigmoid、Tanh) - He Initialization (用於 ReLU、Leaky ReLU) 8. **Activation Function** 選擇適合的激活函數能影響收斂速度和梯度流動: - ReLU (Rectified Linear Unit):最常用,避免梯度消失 - Leaky ReLU:解決 ReLU 的「死亡神經元」問題 - Swish / GELU:比 ReLU 更平滑,可能有較好的效果 - Sigmoid / Tanh:容易梯度消失,較少用於深層網路 ```python= x = self.pool(torch.relu(self.conv1(x))) x = self.pool(torch.relu(self.conv2(x))) ``` 9. **Regularization** 避免 overfitting 的技巧: - L1 / L2 Regularization (Lasso / Ridge):對權重加懲罰,L2 最常用 - Dropout:隨機讓部分神經元失效,提升模型泛化能力 - Batch Normalization (BN):對每層的輸入做標準化,加速訓練、穩定梯度 ```python= x = self.dropout(x) ``` 10. **Data Augmentation** 對影像做隨機變換來增加資料多樣性,如: - 旋轉、縮放、平移、鏡像翻轉 - 隨機裁切 (Random Crop) - 加入噪聲 (Gaussian Noise) - 這些能降低 overfitting,提升模型泛化能力。 ```python= transform = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomRotation(10), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) ``` 11. **Early Stopping** 如果 validation loss 多次不再下降,提早停止訓練,避免 overfitting。 ```python= if val_loss >= prev_val_loss: print("Early stopping triggered") break else: prev_val_loss = val_loss ``` 12. **Learning Rate Scheduling** 自動調整學習率來提升效果: - Step Decay:每隔幾個 epoch 降低 learning rate - Exponential Decay:學習率指數遞減 - ReduceLROnPlateau:當 loss 不再下降時自動降低學習率 ```python= lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1) ``` 13. **Optimizer Comparison** 不同優化器適合不同場合: - SGD (Stochastic Gradient Descent):基礎版,帶 Momentum 效果更好 - Adam (Adaptive Moment Estimation):適應性學習率,收斂快,適用於大多數情境 - RMSProp:適合非平穩 loss 曲面,常用於 RNN AdamW:修正 Adam 的 L2 正則化問題,更適合深度學習 ```python= optimizer = optim.Adam(model.parameters(), lr=0.001) ``` ### Example ```python= import torch import torch.nn as nn import torch.optim as optim import torchvision import torchvision.transforms as transforms from torch.utils.data import DataLoader import copy import matplotlib.pyplot as plt # 設定設備 (GPU 或 CPU) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 1. 定義數據增強和預處理 transform = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomRotation(10), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # 2. 下載 CIFAR-10 數據集 train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) val_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform) # 3. 定義 DataLoader train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2) val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=2) # 4. 定義簡單的 CNN 模型 class SimpleCNN(nn.Module): def __init__(self): super(SimpleCNN, self).__init__() self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1) self.fc1 = nn.Linear(64*8*8, 128) self.fc2 = nn.Linear(128, 10) self.pool = nn.MaxPool2d(2, 2) self.dropout = nn.Dropout(0.5) def forward(self, x): x = self.pool(torch.relu(self.conv1(x))) x = self.pool(torch.relu(self.conv2(x))) x = x.view(-1, 64*8*8) x = torch.relu(self.fc1(x)) x = self.dropout(x) # Dropout layer x = self.fc2(x) return x # 5. 模型、損失函數和優化器 model = SimpleCNN().to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1) # 6. 訓練函數 def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10): best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 prev_val_loss = float('inf') # Early stopping for epoch in range(num_epochs): print(f'Epoch {epoch + 1}/{num_epochs}') print('-' * 10) # 訓練階段 model.train() running_loss = 0.0 running_corrects = 0 for inputs, labels in train_loader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) epoch_loss = running_loss / len(train_loader.dataset) epoch_acc = running_corrects.double() / len(train_loader.dataset) print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}') # 驗證階段 model.eval() running_corrects = 0 val_loss = 0.0 with torch.no_grad(): for inputs, labels in val_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) val_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) val_loss /= len(val_loader.dataset) val_acc = running_corrects.double() / len(val_loader.dataset) print(f'Validation Loss: {val_loss:.4f} Acc: {val_acc:.4f}') # Early Stopping if val_loss >= prev_val_loss: print("Early stopping triggered") break else: prev_val_loss = val_loss # 記錄最佳模型 if val_acc > best_acc: best_acc = val_acc best_model_wts = copy.deepcopy(model.state_dict()) # 更新學習率 lr_scheduler.step() print(f'Best val Acc: {best_acc:.4f}') model.load_state_dict(best_model_wts) return model # 7. 開始訓練 model_trained = train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10) # 8. 測試模型的準確度 def test_model(model, val_loader): model.eval() running_corrects = 0 with torch.no_grad(): for inputs, labels in val_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, preds = torch.max(outputs, 1) running_corrects += torch.sum(preds == labels.data) accuracy = running_corrects.double() / len(val_loader.dataset) print(f'Test Accuracy: {accuracy:.4f}') test_model(model_trained, val_loader) ``` ### Visualize ```python= def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10): best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 prev_val_loss = float('inf') # Early stopping # 用來儲存損失和準確度 train_losses, val_losses = [], [] train_accuracies, val_accuracies = [], [] for epoch in range(num_epochs): print(f'Epoch {epoch + 1}/{num_epochs}') print('-' * 10) # 訓練階段 model.train() running_loss = 0.0 running_corrects = 0 for inputs, labels in train_loader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) epoch_loss = running_loss / len(train_loader.dataset) epoch_acc = running_corrects.double() / len(train_loader.dataset) train_losses.append(epoch_loss) train_accuracies.append(epoch_acc.item()) print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}') # 驗證階段 model.eval() running_corrects = 0 val_loss = 0.0 with torch.no_grad(): for inputs, labels in val_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) val_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) val_loss /= len(val_loader.dataset) val_acc = running_corrects.double() / len(val_loader.dataset) val_losses.append(val_loss) val_accuracies.append(val_acc.item()) print(f'Validation Loss: {val_loss:.4f} Acc: {val_acc:.4f}') # Early Stopping if val_loss >= prev_val_loss: print("Early stopping triggered") break else: prev_val_loss = val_loss # 記錄最佳模型 if val_acc > best_acc: best_acc = val_acc best_model_wts = copy.deepcopy(model.state_dict()) # 更新學習率 lr_scheduler.step() print(f'Best val Acc: {best_acc:.4f}') model.load_state_dict(best_model_wts) # 可視化損失和準確度 plt.figure(figsize=(12, 4)) # 損失曲線 plt.subplot(1, 2, 1) plt.plot(range(1, num_epochs+1), train_losses, label='Train Loss') plt.plot(range(1, num_epochs+1), val_losses, label='Validation Loss') plt.title('Loss over Epochs') plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend() # 準確度曲線 plt.subplot(1, 2, 2) plt.plot(range(1, num_epochs+1), train_accuracies, label='Train Accuracy') plt.plot(range(1, num_epochs+1), val_accuracies, label='Validation Accuracy') plt.title('Accuracy over Epochs') plt.xlabel('Epochs') plt.ylabel('Accuracy') plt.legend() plt.tight_layout() plt.show() return model ``` ```python= def test_model(model, val_loader, num_samples=5): model.eval() running_corrects = 0 incorrect_images = [] incorrect_labels = [] incorrect_preds = [] with torch.no_grad(): for inputs, labels in val_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, preds = torch.max(outputs, 1) running_corrects += torch.sum(preds == labels.data) # 記錄錯誤分類的樣本 incorrect_indices = torch.where(preds != labels.data)[0] for idx in incorrect_indices: incorrect_images.append(inputs[idx].cpu()) incorrect_labels.append(labels[idx].cpu()) incorrect_preds.append(preds[idx].cpu()) accuracy = running_corrects.double() / len(val_loader.dataset) print(f'Test Accuracy: {accuracy:.4f}') # 顯示錯誤分類的樣本 fig, axes = plt.subplots(1, num_samples, figsize=(12, 4)) for i in range(num_samples): ax = axes[i] ax.imshow(incorrect_images[i].permute(1, 2, 0)) # 將圖片從 CxHxW 轉換為 HxWxC ax.set_title(f"True: {incorrect_labels[i]}, Pred: {incorrect_preds[i]}") ax.axis('off') plt.show() ```