# AI Basic Concept
### CNN (Convolution Neural Network)
1. **Optimization**
**梯度下降控制 (Gradient Descent Control)**
梯度下降是一種用於最小化損失函數的優化演算法,透過反覆更新模型參數來減少誤差。然而,如果調整不當,訓練損失可能會停滯不前。
- **關鍵點 (Critical Points)**:如果損失函數的梯度為零,梯度下降將無法繼續更新參數,導致訓練停滯。
- 主要類型的關鍵點:
- **局部最小值 (Local Minima)**:梯度為零,但損失函數並未達到全域最小值。
- **鞍點 (Saddle Points)**:在某些維度上梯度為正,在其他維度上梯度為負,使優化變得困難。
```python=
optimizer = optim.Adam(model.parameters(), lr=0.001)
```
2. **Batch**
為提高計算效率,訓練數據被分成多個 批次 (Batch)。
- 批次大小 (Batch Size):定義在更新模型權重前處理的樣本數。
- Epoch (訓練週期):對整個數據集進行一次完整訓練,每個 epoch 由多個 batch 組成。
- 批次大小的影響:
- 小批次:每個 epoch 需要較長時間,但更具隨機性,可能有助於泛化能力。
- 大批次:每個 epoch 訓練較快,但可能導致泛化能力下降。
```python=
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=2)
```
3. **Momentum**
動量技術透過考慮過去的梯度來幫助逃離鞍點和局部最小值。
- 有可能可以對抗 saddle point 或 local minima 的技術 -> 往 gradient descent 的反方向再加一步去移動參數
4. **Learning rate**
學習率決定每次更新權重的幅度。
- 自適應學習率策略:
- 如果梯度較小 (損失曲面較平坦),應增大學習率。
- 如果梯度較大 (損失曲面陡峭),應減小學習率。
```python=
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
```
5. **Optimization**
- Adam
- RMSProp:根據過去的梯度平方值調整每個參數的學習率。
- Momentum:累積過去的梯度,使更新更穩定。
```python=
optimizer = optim.Adam(model.parameters(), lr=0.001)
```
6. **Loss**
損失函數衡量預測值與實際值之間的差異。
- Sigmoid 交叉熵損失 (Sigmoid Cross-Entropy Loss):用於二元分類。
- Softmax 交叉熵損失 (Softmax Cross-Entropy Loss):用於多類別分類。
```python=
criterion = nn.CrossEntropyLoss()
```
7. **Weight Initialization**
如果初始化不當,可能會導致梯度消失或梯度爆炸問題:
- Xavier Initialization (用於 Sigmoid、Tanh)
- He Initialization (用於 ReLU、Leaky ReLU)
8. **Activation Function**
選擇適合的激活函數能影響收斂速度和梯度流動:
- ReLU (Rectified Linear Unit):最常用,避免梯度消失
- Leaky ReLU:解決 ReLU 的「死亡神經元」問題
- Swish / GELU:比 ReLU 更平滑,可能有較好的效果
- Sigmoid / Tanh:容易梯度消失,較少用於深層網路
```python=
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
```
9. **Regularization**
避免 overfitting 的技巧:
- L1 / L2 Regularization (Lasso / Ridge):對權重加懲罰,L2 最常用
- Dropout:隨機讓部分神經元失效,提升模型泛化能力
- Batch Normalization (BN):對每層的輸入做標準化,加速訓練、穩定梯度
```python=
x = self.dropout(x)
```
10. **Data Augmentation**
對影像做隨機變換來增加資料多樣性,如:
- 旋轉、縮放、平移、鏡像翻轉
- 隨機裁切 (Random Crop)
- 加入噪聲 (Gaussian Noise)
- 這些能降低 overfitting,提升模型泛化能力。
```python=
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
```
11. **Early Stopping**
如果 validation loss 多次不再下降,提早停止訓練,避免 overfitting。
```python=
if val_loss >= prev_val_loss:
print("Early stopping triggered")
break
else:
prev_val_loss = val_loss
```
12. **Learning Rate Scheduling**
自動調整學習率來提升效果:
- Step Decay:每隔幾個 epoch 降低 learning rate
- Exponential Decay:學習率指數遞減
- ReduceLROnPlateau:當 loss 不再下降時自動降低學習率
```python=
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
```
13. **Optimizer Comparison**
不同優化器適合不同場合:
- SGD (Stochastic Gradient Descent):基礎版,帶 Momentum 效果更好
- Adam (Adaptive Moment Estimation):適應性學習率,收斂快,適用於大多數情境
- RMSProp:適合非平穩 loss 曲面,常用於 RNN
AdamW:修正 Adam 的 L2 正則化問題,更適合深度學習
```python=
optimizer = optim.Adam(model.parameters(), lr=0.001)
```
### Example
```python=
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import copy
import matplotlib.pyplot as plt
# 設定設備 (GPU 或 CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. 定義數據增強和預處理
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# 2. 下載 CIFAR-10 數據集
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
# 3. 定義 DataLoader
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=2)
# 4. 定義簡單的 CNN 模型
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.fc1 = nn.Linear(64*8*8, 128)
self.fc2 = nn.Linear(128, 10)
self.pool = nn.MaxPool2d(2, 2)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = x.view(-1, 64*8*8)
x = torch.relu(self.fc1(x))
x = self.dropout(x) # Dropout layer
x = self.fc2(x)
return x
# 5. 模型、損失函數和優化器
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
# 6. 訓練函數
def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10):
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
prev_val_loss = float('inf') # Early stopping
for epoch in range(num_epochs):
print(f'Epoch {epoch + 1}/{num_epochs}')
print('-' * 10)
# 訓練階段
model.train()
running_loss = 0.0
running_corrects = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(train_loader.dataset)
epoch_acc = running_corrects.double() / len(train_loader.dataset)
print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
# 驗證階段
model.eval()
running_corrects = 0
val_loss = 0.0
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
val_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
val_loss /= len(val_loader.dataset)
val_acc = running_corrects.double() / len(val_loader.dataset)
print(f'Validation Loss: {val_loss:.4f} Acc: {val_acc:.4f}')
# Early Stopping
if val_loss >= prev_val_loss:
print("Early stopping triggered")
break
else:
prev_val_loss = val_loss
# 記錄最佳模型
if val_acc > best_acc:
best_acc = val_acc
best_model_wts = copy.deepcopy(model.state_dict())
# 更新學習率
lr_scheduler.step()
print(f'Best val Acc: {best_acc:.4f}')
model.load_state_dict(best_model_wts)
return model
# 7. 開始訓練
model_trained = train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10)
# 8. 測試模型的準確度
def test_model(model, val_loader):
model.eval()
running_corrects = 0
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
running_corrects += torch.sum(preds == labels.data)
accuracy = running_corrects.double() / len(val_loader.dataset)
print(f'Test Accuracy: {accuracy:.4f}')
test_model(model_trained, val_loader)
```
### Visualize
```python=
def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10):
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
prev_val_loss = float('inf') # Early stopping
# 用來儲存損失和準確度
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []
for epoch in range(num_epochs):
print(f'Epoch {epoch + 1}/{num_epochs}')
print('-' * 10)
# 訓練階段
model.train()
running_loss = 0.0
running_corrects = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(train_loader.dataset)
epoch_acc = running_corrects.double() / len(train_loader.dataset)
train_losses.append(epoch_loss)
train_accuracies.append(epoch_acc.item())
print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
# 驗證階段
model.eval()
running_corrects = 0
val_loss = 0.0
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
val_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
val_loss /= len(val_loader.dataset)
val_acc = running_corrects.double() / len(val_loader.dataset)
val_losses.append(val_loss)
val_accuracies.append(val_acc.item())
print(f'Validation Loss: {val_loss:.4f} Acc: {val_acc:.4f}')
# Early Stopping
if val_loss >= prev_val_loss:
print("Early stopping triggered")
break
else:
prev_val_loss = val_loss
# 記錄最佳模型
if val_acc > best_acc:
best_acc = val_acc
best_model_wts = copy.deepcopy(model.state_dict())
# 更新學習率
lr_scheduler.step()
print(f'Best val Acc: {best_acc:.4f}')
model.load_state_dict(best_model_wts)
# 可視化損失和準確度
plt.figure(figsize=(12, 4))
# 損失曲線
plt.subplot(1, 2, 1)
plt.plot(range(1, num_epochs+1), train_losses, label='Train Loss')
plt.plot(range(1, num_epochs+1), val_losses, label='Validation Loss')
plt.title('Loss over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
# 準確度曲線
plt.subplot(1, 2, 2)
plt.plot(range(1, num_epochs+1), train_accuracies, label='Train Accuracy')
plt.plot(range(1, num_epochs+1), val_accuracies, label='Validation Accuracy')
plt.title('Accuracy over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()
return model
```
```python=
def test_model(model, val_loader, num_samples=5):
model.eval()
running_corrects = 0
incorrect_images = []
incorrect_labels = []
incorrect_preds = []
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
running_corrects += torch.sum(preds == labels.data)
# 記錄錯誤分類的樣本
incorrect_indices = torch.where(preds != labels.data)[0]
for idx in incorrect_indices:
incorrect_images.append(inputs[idx].cpu())
incorrect_labels.append(labels[idx].cpu())
incorrect_preds.append(preds[idx].cpu())
accuracy = running_corrects.double() / len(val_loader.dataset)
print(f'Test Accuracy: {accuracy:.4f}')
# 顯示錯誤分類的樣本
fig, axes = plt.subplots(1, num_samples, figsize=(12, 4))
for i in range(num_samples):
ax = axes[i]
ax.imshow(incorrect_images[i].permute(1, 2, 0)) # 將圖片從 CxHxW 轉換為 HxWxC
ax.set_title(f"True: {incorrect_labels[i]}, Pred: {incorrect_preds[i]}")
ax.axis('off')
plt.show()
```