# Transformer training record ## 紀錄(5/28) ### 參數 ```jsonld! # Safe Training Configuration - Prevents System Crashes # Reduced resource usage for stable training model: # Model architecture parameters token_type: "patch" # Using patch tokens (more efficient) d_model: 512 # REDUCED from 768 to save memory num_heads: 8 # REDUCED from 12 to save memory num_layers: 4 # REDUCED from 6 to save memory feature_dim: 49 # Spatial feature dimension (7x7=49) in_channels: 1 # Input image channels (grayscale) use_pretrained: true # Use ImageNet pretrained ResNet34 # Optimization parameters lr: 5e-5 # REDUCED learning rate for stability weight_decay: 1e-5 # Balanced weight decay # Loss function parameters lambda_t2_action: 1.0 # Equal weighting for at1 and at2 smooth_l1_beta: 1.0 # Standard beta primary_task_only: false # Use both at1 and at2 losses # Advanced settings use_flash_attn: false # Disable flash attention for stability training: # Data loading parameters - REDUCED FOR SAFETY batch_size: 4 # VERY SMALL batch size to prevent OOM num_workers: 2 # REDUCED workers to save CPU/RAM # Training schedule max_epochs: 50 # Reduced epochs for testing early_stop_patience: 10 # Reduced patience check_val_every_n_epoch: 2 # Less frequent validation # Optimization settings gradient_clip_val: 0.5 # REDUCED gradient clipping accumulate_grad_batches: 4 # ACCUMULATE gradients to simulate larger batch log_every_n_steps: 50 # Less frequent logging # Hardware settings accelerator: "auto" # Hardware accelerator precision: 32 # MIXED PRECISION to save memory # Data configuration data: # Use only data_0513_01 since it's the only one with images train_ratio: 0.7 # 70% for training val_ratio: 0.15 # 15% for validation test_ratio: 0.15 # 15% for testing ``` - 因為環境算力不夠所以使用 patch token 的方式進行訓練 - 資料經過標準化(使用訓練集資料) ### loss 與 epoch 圖 橫軸為 epoch 但圖表以 batch 數量顯示實際只有在每個 epoch 結束時的 batch 地方會有值   就圖表的結果來看是有收斂的且 validation set 表現較 training set 略差是合理的。 ### ground truth and prediction scatter plot #### 訓練集的圖  - 從圖中可以看到,$X、Y、Z$ 三個平移軸的 $R^2$ 均達到了 0.9 以上──這表示模型對三維位置的預測能解釋超過 90% 的總變異量,預測值和真實值高度吻合,殘差極小,說明在位置估計方面模型表現優異。 - 然而在轉動軸的方面 $Pitch$ 的值達到 0.68 其餘的兩個軸向並沒有很好的學習到 #### validation set  驗證集上所有軸的 $𝑅^2$ 都接近 0 或為負,代表模型幾乎無法解釋真實位姿的變異。 ##### validation set 各軸單獨數據 - x軸 -  - y軸 -  - z軸 -  - roll軸 -  - pitch軸 -  - yaw軸 -  ## 觀念上的釐清 ### Batch - 訓練資料會根據 batch size 被切成多個小批次(batch) - 每個 batch 會進行一次完整的:前向傳播 → 計算 loss → 反向傳播 → 更新參數 - loss 是針對這個 batch 裡所有樣本的平均(或總和)來計算的 - 這種 mini-batch 訓練方式能有效 穩定梯度、提高效率、降低記憶體使用 ### Epoch - 一個 epoch 表示模型 完整看過整個訓練資料集一次 - 在一個 epoch 中,模型會依序訓練每個 batch,並多次更新參數 - 每個 epoch 結束後,通常會將訓練資料打亂(shuffle),讓模型在下一輪學到更泛化的特徵,避免記住固定順序 ## 結果解讀說明  - 以這張圖紀錄的就是每個 batch 算出來的 loss 會長這樣  這個則是每個 epoch 將所有的 batch 計算出來的數值進行平均的結果
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up