# Bionemo ## Quickstart https://docs.nvidia.com/bionemo-framework/latest/quickstart-fw.html#ngc-setup **Train** ```bash python pretrain.py --config-path=conf --config-name=pretrain_xsmall_span_aug \ do_training=True model.data.dataset_path=$(pwd)/zinc_csv \ model.data.dataset.train=x000 \ model.data.dataset.val=x000 \ model.data.dataset.test=x000 \ exp_manager.exp_dir=$(pwd)/results ``` ### 測試結果 - GPU: 2080ti #### 預設參數 | GPU | VRAM | Host RAM | |:------:|:-----:|:--------:| | 15~20% | 1 GiB | 2813MiB |  Epoch eta: $\approx$ 17:15:00 #### Devices: 1 => 2 | GPU | VRAM | Host RAM | |:------:|:---------:|:------------:| | 20~35% | 1.035 GiB | 3545/2077MiB |  Epoch eta: $\approx$ 16:30:00 #### micro_batch_size: 32 => 128 | GPU | VRAM | Host RAM | |:------:|:---------:|:------------:| | 35~45% | 1.276 GiB | 7967/2089MiB |  Epoch eta: $\approx$ 19:50:00 :::info **發現** - GPU Memory 的使用率不是線性成長的 - GPU 使用率升高,但是 ETA 反而上升 - 訓練的 log,無論參數,都顯示有 2000000 個 step  - 發現有一個 max_steps 參數  ::: #### max_steps: 2000000 => 20000 (micro_batch_size: 32) | GPU | VRAM | Host RAM | |:------:|:---------:|:------------:| | 20~30% | 1.031 GiB | 2077/2067MiB |  Running Time: 11:27 #### tensor_model_parallel_size: 1 => 2 | GPU | VRAM | Host RAM | |:------:|:---------:|:------------:| | 20~70% | 0.988 GiB | 2092/2087MiB |  Running Time: 11:24
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up