LLaVA-Med / Training

LLaVA-Med / Training === ###### tags: `LLM / LLaVA` ###### tags: `LLM`, `LLaVA`, `LLaVA-Med` :::warning :bulb: **相關 HackMD 隨手紀錄** - [LLaVA-Med](https://hackmd.io/aiQU5maSQKGQ6NVIGB3hnQ) - [LLaVA-Med / few_shot_example](https://hackmd.io/8RmlqpfgRkOOGqPsBhbjiA) - [LLaVA-Med / env](https://hackmd.io/N-s0nhmETbSPFg94QtSU3Q) - [LLaVA-Med / Training](https://hackmd.io/2SDurdu9QNqH_N3udJFFTQ) ::: [TOC] ## LLama + delta = LLaVA > LLama weights + delta weights = LLaVA weights ### python3 -m llava.model.apply_delta ``` $ python3 -m llava.model.apply_delta --help Traceback (most recent call last): File ".../envs/llava-med2/lib/python3.10/runpy.py", line 187, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/.../envs/llava-med2/lib/python3.10/runpy.py", line 110, in _get_module_details __import__(pkg_name) File ".../LLaVA-Med/llava/__init__.py", line 1, in <module> from .model import LlavaLlamaForCausalLM File ".../LLaVA-Med/llava/model/__init__.py", line 1, in <module> from .llava import LlavaLlamaForCausalLM File ".../LLaVA-Med/llava/model/llava.py", line 29, in <module> import open_clip ModuleNotFoundError: No module named 'open_clip' ``` - **解法** - `$ pip install open_clip_torch` -> PASS - [No module named ‘open_clip‘](https://blog.csdn.net/m0_57273938/article/details/131958832) - **正常情況** ``` $ python3 -m llava.model.apply_delta --help usage: apply_delta.py [-h] --base-model-path BASE_MODEL_PATH --target-model-path TARGET_MODEL_PATH --delta-path DELTA_PATH options: -h, --help show this help message and exit --base-model-path BASE_MODEL_PATH --target-model-path TARGET_MODEL_PATH --delta-path DELTA_PATH ``` ## Model ### Initialization from LLaVA-7B Weights ```bash= $ python3 -m llava.model.apply_delta \ --base /path/to/llama-7b \ --target /output/path/to/LLaVA-7b-v0 \ --delta /huggingface.co/liuhaotian/LLaVA-7b-delta-v0 ``` - `/huggingface.co/liuhaotian/LLaVA-7b-delta-v0` - 路徑錯誤 - error ``` huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/huggingface.co/liuhaotian/LLaVA-7b-delta-v0'. Use `repo_type` argument if needed. ``` - 可能是 `--delta liuhaotian/LLaVA-7b-delta-v0` (未測試) - https://huggingface.co/liuhaotian/LLaVA-7b-delta-v0 也不行 - error ``` ... json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ... raise EnvironmentError( OSError: It looks like the config file at '/tmp/tmpj9xpil1e' is not a valid JSON file. ``` - ### 手動下載 LLaVA-7b-delta-v0 ``` $ git clone https://huggingface.co/liuhaotian/LLaVA-7b-delta-v0 Cloning into 'LLaVA-7b-delta-v0'... remote: Enumerating objects: 19, done. remote: Total 19 (delta 0), reused 0 (delta 0), pack-reused 19 Unpacking objects: 100% (19/19), 4.71 KiB | 965.00 KiB/s, done. Filtering content: 100% (4/4), 4.56 GiB | 8.43 MiB/s, done. Encountered 1 file that may not have been copied correctly on Windows: pytorch_model-00001-of-00002.bin See: `git lfs help smudge` for more details. ``` - 不確定檔案 `pytorch_model-00001-of-00002.bin` 有沒有問題 - 安裝 git lfs ``` git lfs install ``` - 單獨下載 `pytorch_model-00001-of-00002.bin` ``` $ git lfs pull --include=pytorch_model-00001-of-00002.bin ``` - 與原始檔案一致 -> OK - LLama + delta = LLaVA ``` python3 -m llava.model.apply_delta \ --base Llama-2-7b-hf/ --target LLaVA-7b-v0/ --delta LLaVA-7b-delta-v0/ ``` - 注意這邊的 base 是 Llama-2，不是 Llama - 只是查看這樣的指令是否可跑 -> OK ``` Loading base model Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.42it/s] Loading delta You are using a model of type llama to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors. config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.52k/4.52k [00:00<00:00, 12.0MB/s] model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.71G/1.71G [00:22<00:00, 74.9MB/s] ... ... ... - This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.59s/it] Applying delta Applying delta: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 325/325 [01:53<00:00, 2.87it/s] Saving target model ``` ``` $ ll -hs LLaVA-7b-v0 total 13G 4.0K drwxr-xr-x 2 4.0K Mar 14 15:46 ./ 4.0K drwxr-xr-x 4 4.0K Mar 14 15:42 ../ 4.0K -rw-r--r-- 1 88 Mar 14 15:46 added_tokens.json 4.0K -rw-r--r-- 1 748 Mar 14 15:46 config.json 4.0K -rw-r--r-- 1 137 Mar 14 15:46 generation_config.json 9.3G -rw-r--r-- 1 9.3G Mar 14 15:46 pytorch_model-00001-of-00002.bin 3.3G -rw-r--r-- 1 3.3G Mar 14 15:46 pytorch_model-00002-of-00002.bin 28K -rw-r--r-- 1 27K Mar 14 15:46 pytorch_model.bin.index.json 4.0K -rw-r--r-- 1 97 Mar 14 15:46 special_tokens_map.json 4.0K -rw-r--r-- 1 833 Mar 14 15:46 tokenizer_config.json 492K -rw-r--r-- 1 489K Mar 14 15:46 tokenizer.model ``` ### [[model] liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main) > 直接用此 checkpoint 來作為 llava-med 的先行測試模型 [![](https://hackmd.io/_uploads/S1xda7gAa.png)](https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main) - ### 測試：可在 llava 環境執行 cli 測試 ## 推論服務 ### via Web UI - ### Launch a controller ```bash $ python -m llava.serve.controller --host 0.0.0.0 --port 10000 ``` - ### Launch a model worker ```bash $ CUDA_VISIBLE_DEVICES=2,3 python \ -m llava.serve.model_worker \ --host 0.0.0.0 \ --controller http://localhost:10000 \ --port 40000 \ --worker http://localhost:40000 --model-path ./LLaVA-7b-v0 \ --multi-modal \ --num-gpus=2 ``` - 等待 model 載入完畢，直到看到 `Uvicorn running on ...` - ### Send a test message > 測試 controller & worker 是否有串起來 ``` $ python -m llava.serve.test_message \ --model-name LLaVA-7b-v0 \ --controller http://localhost:10000 Models: ['LLaVA-7b-v0'] worker_addr: http://localhost:40000 A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. Human: What are the key differences between renewable and non-renewable energy sources? Assistant: Renewable energy sources are those that can be replenished naturally in a relatively short amount of time, such as solar, wind, hydro, geothermal, and biomass. Non-renewable energy sources, on the other hand, are finite and will eventually be depleted, such as coal, oil, and natural gas. Here are some key differences between renewable and non-renewable energy sources: 1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable energy sources are finite and will eventually run out. 2. Environmental impact: Renewable energy sources have a much lower environmental impact than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, and other negative effects. 3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically have lower operational costs than non-renewable sources. 4. Reliability: Renewable energy sources are often more reliable and can be used in more remote locations than non-renewable sources. 5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different situations and needs, while non-renewable sources are more rigid and inflexible. 6. Sustainability: Renewable energy sources are more sustainable over the long term, while non-renewable sources are not, and their depletion can lead to economic and social instability. Human: Tell me a story with more than 1000 words. Assistant: Once upon a time, there was a prince who was very smart and very brave. He was the son of a great king, and he had ``` - ### Launch a gradio web server. > 啟動 web UI ``` $ python -m llava.serve.gradio_web_server \ --controller http://localhost:10000 2024-03-14 16:51:33 | INFO | gradio_web_server | args: Namespace(host='0.0.0.0', port=None, controller_url='http://localhost:10000', concurrency_count=8, model_list_mode='once', share=False, moderate=False, embed=False) 2024-03-14 16:51:33 | INFO | gradio_web_server | Models: ['LLaVA-7b-v0'] 2024-03-14 16:51:33 | INFO | gradio_web_server | Namespace(host='0.0.0.0', port=None, controller_url='http://localhost:10000', concurrency_count=8, model_list_mode='once', share=False, moderate=False, embed=False) 2024-03-14 16:51:34 | INFO | stdout | Running on local URL: http://0.0.0.0:7860 ``` - http://0.0.0.0:7860 ![](https://hackmd.io/_uploads/rJnjsVl0T.png) ### via cli ``` (llava) $ python -m llava.serve.cli \ --model-path llava-v1.5-7b \ --image-file "https://llava-vl.github.io/static/images/view.jpg" \ --load-4bit [2024-03-15 15:23:42,990] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors. Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] .../python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.93s/it] USER: 這是什麼場景的圖？ ASSISTANT: 這是一個湖畔的場景，有一條木製的步道或小橋樑，它們橫越湖面，並且有一個小岸邊。 ``` - [[llava] CLI Inference](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#cli-inference) - https://llava-vl.github.io/static/images/view.jpg [![](https://hackmd.io/_uploads/ryRRlKZAp.png)](https://llava-vl.github.io/static/images/view.jpg) ### eval via checkpoint ``` # online image (llava) $ python -m llava.eval.run_llava \ --model-path LLaVA/checkpoints/llava-med-7b-finetune-10k/checkpoint-750/ \ --model-base LLaVA/checkpoints/lmsys/vicuna-7b-v1.5/ \ --image-file "https://llava-vl.github.io/static/images/view.jpg" \ --query "這是什麼場景的圖？" ``` <hr> ## deepspeed config ### zero3_offload.json [scripts/zero3_offload.json](https://github.com/haotian-liu/LLaVA/blob/main/scripts/zero3_offload.json) ### 載入 zero3_offload.json 的 Traceback ``` Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 793, in train model_args, data_args, training_args = parser.parse_args_into_dataclasses() File ".../python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses obj = dtype(**inputs) File "<string>", line 136, in __init__ File ".../python3.10/site-packages/transformers/training_args.py", line 1720, in __post_init__ self.hf_deepspeed_config = HfTrainerDeepSpeedConfig(self.deepspeed) File ".../python3.10/site-packages/transformers/integrations/deepspeed.py", line 88, in __init__ super().__init__(config_file_or_dict) File ".../python3.10/site-packages/transformers/integrations/deepspeed.py", line 78, in __init__ super().__init__(config_file_or_dict) File ".../python3.10/site-packages/accelerate/utils/deepspeed.py", line 49, in __init__ config = json.load(f) ``` <hr> ## pretrain ### script [scripts/zero3_offload.json](https://github.com/haotian-liu/LLaVA/blob/main/scripts/zero3_offload.json) (完全一樣，不修改) ```bash # CUDA_VISIBLE_DEVICES=2,3 bash tj_pretrain_deepspeed.sh deepspeed --include=localhost:2,3 llava/train/train_xformers.py \ --deepspeed ./scripts/zero3_offload.json \ --version plain \ --model_name_or_path ./checkpoints/liuhaotian/llava-v1.5-7b \ --data_path data/alignment/llava_med_alignment_3.json \ --image_folder data/images-pretrain-3 \ --vision_tower openai/clip-vit-large-patch14 \ --tune_mm_mlp_adapter True \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 False \ --fp16 True \ --output_dir ./checkpoints/llava-med-7b-pretrain \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 2 \ --save_total_limit 1 \ --learning_rate 2e-3 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True \ --report_to none ``` ### resume status - ### 現況： ``` checkpoints/llava-med-7b-pretrain/ ├── checkpoint-1 │ ├── config.json │ └── mm_projector.bin ├── checkpoint-2 │ ├── config.json │ └── mm_projector.bin ├── checkpoint-3 │ ├── config.json │ └── mm_projector.bin ├── config.json ├── mm_projector.bin └── trainer_state.json ``` - ### checkpoint-x 需要底下4個檔案(目前都缺失) > - `trainer_state.json` > - `latest` > - `global_step*` > - `zero_pp_rank_0_mp_rank_*_model_states.pt` - `checkpoints/llava-med-7b-pretrain/checkpoint-xxx` - `trainer_state.json` ``` .../python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 967, in train trainer.train(resume_from_checkpoint=True) File ".../python3.10/site-packages/transformers/trainer.py", line 1513, in train state = TrainerState.load_from_json(os.path.join(resume_from_checkpoint, TRAINER_STATE_NAME)) File ".../python3.10/site-packages/transformers/trainer_callback.py", line 120, in load_from_json with open(json_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/llava-med-7b-pretrain/checkpoint-3/trainer_state.json' ``` - 相關 - [[Usage] Missing "trainer_state.json" when resuming training from saved checkpoints #1164](https://github.com/haotian-liu/LLaVA/issues/1164) - [[code] def _tune_save_checkpoint(self)](https://huggingface.co/transformers/v4.4.2/_modules/transformers/trainer.html) [![](https://hackmd.io/_uploads/S1KJm99JR.png)](https://hackmd.io/_uploads/S1KJm99JR.png) - `latest` ``` [2024-04-03 14:31:40,152] [WARNING] [engine.py:2716:load_checkpoint] Unable to find latest file at ./checkpoints/llava-med-7b-pretrain/checkpoint-3/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 967, in train trainer.train(resume_from_checkpoint=True) File ".../python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File ".../python3.10/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File ".../python3.10/site-packages/transformers/integrations/deepspeed.py", line 407, in deepspeed_load_checkpoint raise ValueError(f"[deepspeed] failed to resume from checkpoint {checkpoint_path}") ValueError: [deepspeed] failed to resume from checkpoint ./checkpoints/llava-med-7b-pretrain/checkpoint-3 ``` - `global_step*` ``` Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 967, in train trainer.train(resume_from_checkpoint=True) File ".../python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File ".../python3.10/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File ".../python3.10/site-packages/transformers/integrations/deepspeed.py", line 407, in deepspeed_load_checkpoint raise ValueError(f"Can't find a valid checkpoint at {checkpoint_path}") ValueError: Can't find a valid checkpoint at ./checkpoints/llava-med-7b-pretrain/checkpoint-3 ``` - `zero_pp_rank_0_mp_rank_*_model_states.pt` ``` [TJ] ckpt_file_pattern=./checkpoints/llava-med-7b-pretrain/checkpoint-3/zero_pp_rank_0_mp_rank_*_model_states.pt [TJ] checkpoint file list: [] Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 967, in train trainer.train(resume_from_checkpoint=True) File ".../python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File ".../python3.10/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File ".../python3.10/site-packages/transformers/integrations/deepspeed.py", line 403, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File ".../python3.10/site-packages/deepspeed/runtime/engine.py", line 2726, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File ".../python3.10/site-packages/deepspeed/runtime/engine.py", line 2763, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File ".../python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File ".../python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 194, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File ".../python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File ".../python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 169, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError ``` ### artifact / config.json ```json= { "_name_or_path": "./checkpoints/liuhaotian/llava-v1.5-7b", "architectures": [ "LlavaLlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "freeze_mm_mlp_adapter": false, "freeze_mm_vision_resampler": false, "hidden_act": "silu", "hidden_size": 4096, "image_aspect_ratio": "square", "initializer_range": 0.02, "intermediate_size": 11008, "max_length": 4096, "max_position_embeddings": 4096, "mm_hidden_size": 1024, "mm_patch_merge_type": "flat", "mm_projector_lr": null, "mm_projector_type": "linear", "mm_resampler_type": null, "mm_use_im_patch_token": false, "mm_use_im_start_end": false, "mm_vision_select_feature": "patch", "mm_vision_select_layer": -2, "mm_vision_tower": "openai/clip-vit-large-patch14", "model_type": "llava_llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "tokenizer_model_max_length": 2048, "tokenizer_padding_side": "right", "torch_dtype": "float16", "transformers_version": "4.37.2", "tune_mm_mlp_adapter": true, "tune_mm_vision_resampler": false, "unfreeze_mm_vision_tower": false, "use_cache": true, "use_mm_proj": true, "vocab_size": 32000 } ``` ### artifact / trainer_state.json ```json= { "best_metric": null, "best_model_checkpoint": null, "epoch": 1.0, "eval_steps": 500, "global_step": 6, "is_hyper_param_search": false, "is_local_process_zero": true, "is_world_process_zero": true, "log_history": [ { "epoch": 0.17, "learning_rate": 0.0, "loss": 2.2023, "step": 1 }, { "epoch": 0.33, "learning_rate": 0.0, "loss": 3.9386, "step": 2 }, { "epoch": 0.5, "learning_rate": 0.002, "loss": 2.5545, "step": 3 }, { "epoch": 0.67, "learning_rate": 0.002, "loss": 2.3959, "step": 4 }, { "epoch": 0.83, "learning_rate": 0.002, "loss": 2.5098, "step": 5 }, { "epoch": 1.0, "learning_rate": 0.002, "loss": 1.8839, "step": 6 }, { "epoch": 1.0, "step": 6, "total_flos": 2846638080.0, "train_loss": 2.5808410048484802, "train_runtime": 44.4043, "train_samples_per_second": 0.135, "train_steps_per_second": 0.135 } ], "logging_steps": 1.0, "max_steps": 6, "num_input_tokens_seen": 0, "num_train_epochs": 1, "save_steps": 1.0, "total_flos": 2846638080.0, "train_batch_size": 1, "trial_name": null, "trial_params": null } ``` ### pretrain-to-finetune - ### 現況： ``` checkpoints/llava-med-7b-pretrain/ ├── checkpoint-1 │ ├── config.json │ └── mm_projector.bin ├── checkpoint-2 │ ├── config.json │ └── mm_projector.bin ├── checkpoint-3 │ ├── config.json │ └── mm_projector.bin ├── config.json <--- final ├── mm_projector.bin <--- final └── trainer_state.json <--- final ``` - ### 將 pretrain 的 final checkpoint，接續跑 fine-tune，會遇到底下錯誤： ``` Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 827, in train model = LlavaLlamaForCausalLM.from_pretrained( File ".../python3.10/site-packages/transformers/modeling_utils.py", line 3338, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ./checkpoints/llava-med-7b-pretrain. ``` - fine-tune (至少)需要底下 4 個檔案 - `pytorch_model.bin` - `tf_model.h5` - `model.ckpt.index` - `flax_model.msgpack` ### trace code > for pretrain - llava/train/train.py - [#train(...)#L788](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L788) - deepspeed - `trainer.deepspeed` 紀錄 config 內容 ``` DeepSpeedEngine( (module): LlavaLlamaForCausalLM( (model): LlavaLlamaModel( (embed_tokens): Embedding(32000, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ... ``` - [checkpoint 資料夾邏輯#L966](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L966) - 從 `--output_dir` 檢查目錄下有無 `checkpoint-*` - 有 -> `trainer.train(resume_from_checkpoint=True)` - 沒有 -> `trainer.train()` - [config.json#196](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L196) - 呼叫 `trainer.model.config.save_pretrained(output_dir)` 產生 - [mm_projector.bin#L206](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L206) - 呼叫 `torch.save(weight_to_save, os.path.join(output_dir, f'mm_projector.bin'))` 產生 - [trainer_state.json#L970](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L970) - 呼叫 `trainer.save_state()` 產生 - [#safe_save_model_for_hf_trainer(...)#L185](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L185) - tune_mm_mlp_adapter=True - output_dir=./checkpoints/llava-med-7b-pretrain - current_folder=llava-med-7b-pretrain - 預設情況下，會走到 [`torch.save(weight_to_save, os.path.join(output_dir, f'mm_projector.bin'))`](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L206) - 會產生底下檔案 - mm_projector.bin - config.json - 不包含 trainer_state.json - 呼叫 `trainer.save_state()` 產生 <hr> ## finetune / script ### cuda 使用事前須知 - ### CUDA_VISIBLE_DEVICES 在 deepspeed 的限制？ ``` $ CUDA_VISIBLE_DEVICES=0,1 bash tj_finetune_deepspeed.sh ``` - 在 deepspeed 中，只能是底下這樣設定： - 0 - 0,1 - 0,1,2 - 0,1,2,3 - 否則會有底下 error ``` Detected CUDA_VISIBLE_DEVICES=0,3: setting --include=localhost:0,3 Traceback (most recent call last): File ".../envs/llava-0326/bin/deepspeed", line 6, in <module> main() File "~/python3.10/site-packages/deepspeed/launcher/runner.py", line 426, in main active_resources = parse_inclusion_exclusion(resource_pool, args.include, args.exclude) File "~/python3.10/site-packages/deepspeed/launcher/runner.py", line 350, in parse_inclusion_exclusion return parse_resource_filter(active_resources, include_str=inclusion, exclude_str=exclusion) File "~/python3.10/site-packages/deepspeed/launcher/runner.py", line 302, in parse_resource_filter raise ValueError(f"No slot '{slot}' specified on host '{hostname}'") ValueError: No slot '3' specified on host 'localhost' ``` - [[BUG] Unable to specify arbitrary GPUs using CUDA_VISIBLE_DEVICES #4248](https://github.com/microsoft/DeepSpeed/issues/4248) ![](https://hackmd.io/_uploads/SJN9tMZk0.png) - 底下這回答有點... ![](https://hackmd.io/_uploads/SkhN9Mby0.png) - 正確解法： > In short, we recommend using a hostfile for this, not CUDA_VISIBLE_DEVICES > https://www.deepspeed.ai/getting-started/#resource-configuration-single-node > ``` > deepspeed --include localhost:1 ... > ``` `CUDA_VISIBLE_DEVICES=2,3 bash tj_finetune_deepspeed.sh` 對應正確的用法為： `deepspeed --include=localhost:2,3 train_xformers.py ...` (不是放在 `train_xformers.py` 後面) - 測試解法與log ``` $ deepspeed --include=localhost:2,3 llava/train/train_xformers.py ... ... ... [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [2, 3]} [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0 [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]}) [INFO] [launch.py:163:main] dist_world_size=2 [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=2,3 ``` `--include=localhost:2,3` 等效於 `CUDA_VISIBLE_DEVICES=2,3` - ### checkpoint 無法 resume 的問題 ```bash # finetune $ bash tj_finetune_deepspeed.sh # 在 241 上，一開始用全部 GPU (4張GPU) # resume checkpoint-750 $ CUDA_VISIBLE_DEVICES=0,1 bash tj_finetune_deepspeed.sh # 後來改用 2 張 GPU，會發生 tensor 大小無法匹配的問題 ``` - **RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 0** ``` Loading extension module cpu_adam... Time to load cpu_adam op: 2.7298762798309326 seconds Loading extension module cpu_adam... Time to load cpu_adam op: 2.8208658695220947 seconds Parameter Offload: Total persistent parameters: 599040 in 312 params Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 967, in train trainer.train(resume_from_checkpoint=True) File ".../python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File ".../python3.10/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File ".../python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File ".../python3.10/site-packages/deepspeed/runtime/engine.py", line 2724, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File ".../python3.10/site-packages/deepspeed/runtime/engine.py", line 2794, in _load_checkpoint self.load_module_state_dict(checkpoint=checkpoint, File ".../python3.10/site-packages/deepspeed/runtime/engine.py", line 2600, in load_module_state_dict param.ds_tensor.data.copy_(saved_frozen_params[name].data) RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 0 ``` - 一開始使用 4 張，後來改成 1 張，有底下錯誤： RuntimeError: The size of tensor a (1024) must match the size of tensor b (256) at non-singleton dimension 0 - 一開始使用 4 張，後來改成 2 張，有底下錯誤： RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 0 - 一開始使用 4 張，後來維持 4 張，可以繼續 resume ### script ```bash= # CUDA_VISIBLE_DEVICES=2,3 bash tj_finetune_deepspeed.sh deepspeed llava/train/train_xformers.py \ --deepspeed ./scripts/zero3_offload_by_tj.json \ --version v1 \ --model_name_or_path ./checkpoints/lmsys/vicuna-7b-v1.5 \ --data_path ./data/instruct/llava_med_instruct_10k.json \ --image_folder ./data/images-finetune-10k \ --vision_tower ~/LLM_Models/openai/clip-vit-large-patch14-336 \ --pretrain_mm_mlp_adapter ~/LLM_Models/liuhaotian/llava-v1.5-7b/mm_projector.bin \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --fp16 True \ --bf16 False \ --output_dir ./checkpoints/llava-med-7b-finetune-10k \ --num_train_epochs 3 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 250 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True \ --report_to none ``` - Cynthia vs TJ [![](https://hackmd.io/_uploads/BysWkfgyR.png)](https://hackmd.io/_uploads/BysWkfgyR.png) ### `zero3_offload_by_tj.json` > source: [LLaVA/scripts/zero3_offload.json](https://github.com/haotian-liu/LLaVA/blob/main/scripts/zero3_offload.json) 修改如下： ```json= { "fp16": { "enabled": true, <---- "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": false <---- }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto" } }, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e9, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e9, "stage3_max_reuse_distance": 1e9, "gather_16bit_weights_on_model_save": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "steps_per_print": 1e5, "wall_clock_breakdown": false } ``` ## finetune / Parameters ### `--model_name_or_path` - 要指定語言模型(language model) ，不是多模態模型(multimodal model) - [llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) 是多模態模型 > LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. - [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) 是語言模型 - 目錄下需要有 `config.json` 檔案，否則會有底下 error: ``` OSError: ./checkpoints/llava-med-7b-finetune-10k does not appear to have a file named config.json. ``` - 如果指到多模態模型，會出現底下錯誤： ``` - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). .../python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() [2024-03-26 17:55:30,619] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 686, num_elems = 7.06B Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 911, in train model.get_model().initialize_vision_modules( File ".../LLaVA/llava/model/llava_arch.py", line 97, in initialize_vision_modules model.get_model().initialize_vision_modules( self.mm_projector.load_state_dict(get_w(mm_projector_weights, 'mm_projector')) File ".../python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Sequential: size mismatch for 0.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([0]). size mismatch for 0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]). size mismatch for 2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([0]). size mismatch for 2.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]). ``` ### `--vision_tower openai/clip-vit-large-patch14-336` - ### LLaVA 使用 CLIP ViT-L/14 336px https://github.com/haotian-liu/LLaVA - `--mm_projector_type mlp2x_gelu`: the two-layer MLP vision-language connector. - `--vision_tower openai/clip-vit-large-patch14-336`: CLIP ViT-L/14 336px. - ### huggingface - [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) - [openai/clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) - ### patch14 vs patch14-336 - ### [[論文][PDF] Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/pdf/2103.00020.pdf) > For the ViT-L/14 we also pre-train at a higher 336 pixel resolution for one additional epoch to boost performance similar to FixRes (Touvron et al., 2019). We denote this model as ViT-L/14@336px. - 2021/02/26: OpenAI https://github.com/OpenAI/CLIP - 簡單摘要：用大量的數據，簡單粗暴地將圖像和文字映射到同一個空間 - ### patch14 - [Multimodality and Large Multimodal Models (LMMs)](https://huyenchip.com/2023/10/10/multimodal.html) ViT-L/14@336px: - Large vision transformer (ViT-L) - 14 patches (each image is divided into 14 sub-images) - on 336x336 pixel input - ### Q1: `ViT-L/14@336px`，這是什麼意思？ - ### ChatGPT3.5 "ViT-L/14@336px" 是指一種視覺 Transformer（Vision Transformer）模型的命名和規格。這個名稱包含了幾個部分： - **"ViT"**：代表視覺 Transformer，這是一種利用 Transformer 架構處理圖像數據的模型。 - ~~**"L/14"**：表示這個模型具有 14 層的 Transformer blocks（或 layers）。這是指模型的深度，通常情況下，深度越深的模型可能具有更強大的表示能力，但也會增加計算成本和訓練時間。~~ - **"@336px"**：表示模型接受的輸入圖像大小為 336x336 像素。這是指模型訓練和推理時處理的圖像的分辨率。總的來說，"ViT-L/14@336px" 是一個描述視覺 Transformer 模型規格的標識，它指定了模型的深度和處理的圖像大小。 - ### ChatGPT4 "ViT-L/14@336px" 是對一種特定配置的 Vision Transformer（ViT）模型的簡短描述。這裡的描述包含了幾個關鍵部分，每個部分都代表了模型的不同特性： - **ViT-L**：指的是 Vision Transformer 的大型版本（Large）。 ViT 型號有多種大小配置，如 Base、Large 等，其中 Large 版本通常具有更多的參數和更深的網路結構，因此在視覺任務上擁有更強的效能，但同時計算成本也更高。 - **/14**：這個部分指的是模型使用的 patch 大小。 ViT 模型透過將輸入影像分割成多個固定大小的patch（小塊），然後將這些 patch 作為序列輸入到 Transformer 網路中。這裡的「14」表示每個 patch 的大小為 14x14 像素。 - **@336px**：這代表模型處理的輸入影像解析度大小為 336x336 像素。 ViT模型可以處理不同解析度的影像，而模型效能可能會隨著輸入影像大小的變化而變化。綜上所述，"ViT-L/14@336px" 描述了一個 Vision Transformer 模型，它是大型版本，使用 14x14 像素的 patch，並針對 336x336 像素解析度的影像進行了最佳化或訓練。這種模型通常用於影像分類、目標偵測等視覺任務，能夠從大尺寸的影像中捕捉豐富的視覺資訊。 - ### Q2: patch作為序列輸入到 Transformer 網絡，這些 patch 是以 row 方式輸入？還是依 column 方式輸入？ - ### ChatGPT4 在Vision Transformer（ViT）模型中，將影像分割成多個patch後，這些patch被展平並按照一定的順序轉換為序列輸入到 Transformer 網路。通常，這個過程是按照影像的列（row）方向進行的，也就是從影像的左上角開始，首先沿著第一列的水平方向對patch進行序列化，接著移動到下一行，繼續沿著水平方向進行，直到影像的最後一個patch。**這種方式類似於閱讀英文文字時的順序，由左到右、從上到下。** 這種以行（row-wise）順序處理影像patch的方法有助於保留影像中局部區域的空間關係，使得序列化後的patch序列能夠在一定程度上反映原始影像的空間佈局，從而有助於 Transformer網路學習到有效的視覺表示。然而，值得注意的是，**Transformer 本身是基於自注意力機制的，理論上對輸入序列的順序不敏感，但實際上輸入的順序可能會對模型學習的特徵產生影響。** - ### Q3: 這種 patch 序列不需要考慮 overlap 嗎？ - ### ChatGPT4 在Vision Transformer（ViT）模型的標準設定中，影像被切分成多個patch時通常不考慮重疊（overlap）。每個patch是從影像中直接抽取的固定大小的非重疊區域。這意味著，影像被等分成多個大小相同的patch，每個patch緊鄰前一個，沒有重疊部分。 **不考慮重疊的原因主要是為了簡化處理流程和減少計算負擔。**透過這種方式，每個patch 可以被視為影像中的一個獨立的 token，這些 token 一起構成了 Transformer 模型的輸入序列。由於 Transformer 基於自注意力機制，它能夠捕捉這些 token 之間的複雜關係，從而學習到有效的全域和局部視覺特徵。然而，在某些視覺任務和特定的模型變種中，引入 patch 的重疊可能會有益於表現提升。例如，在某些細粒度影像辨識或影像分割任務中，重疊的 patch 可以提供更豐富的局部訊息，幫助模型捕捉到更細微的視覺細節。此外，一些研究工作透過引入重疊 patch 和專門設計的網路結構來進一步提高模型的表現。總的來說，標準的 ViT 模型在切分 patch 時不考慮重疊，但根據特定任務的需求，引入重疊的patch可能是一個值得探索的方向。 ### `--output_dir .../checkpoints/llava-med-7b-finetune-10k` ### `--save_strategy "steps"` ### `--save_steps 2` ### `--save_total_limit 3` - 模型存放地方：在 llava-med-7b-finetune-10k 資料夾內 - 儲存策略：根據 steps 決定 - 每經過 2 個 step，就會儲存一次模型 1. llava-med-7b-finetune-10k/checkpoint-2/ 2. llava-med-7b-finetune-10k/checkpoint-4/ 3. llava-med-7b-finetune-10k/checkpoint-6/ 4. llava-med-7b-finetune-10k/checkpoint-8/ 5. llava-med-7b-finetune-10k/checkpoint-10/ 6. ... - 最多只儲存 3 個模型 - 例如跑完 step-20 時，只會看到底下三個資料夾，其餘不相關的會被移除： 1. llava-med-7b-finetune-10k/checkpoint-16/ 2. llava-med-7b-finetune-10k/checkpoint-18/ 3. llava-med-7b-finetune-10k/checkpoint-20/ - 在 241 的 HD 上，每次儲存約 10 分鐘 - ex. `16:58:48` -> `17:08:46` - checkpoint 目錄結構 ``` checkpoints/llava-med-7b-finetune-10k-infer/ └── checkpoint-20 ├── config.json ├── generation_config.json ├── global_step20 │ ├── zero_pp_rank_0_mp_rank_00_model_states.pt │ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt │ ├── zero_pp_rank_1_mp_rank_00_model_states.pt │ ├── zero_pp_rank_1_mp_rank_00_optim_states.pt │ ├── zero_pp_rank_2_mp_rank_00_model_states.pt │ ├── zero_pp_rank_2_mp_rank_00_optim_states.pt │ ├── zero_pp_rank_3_mp_rank_00_model_states.pt │ └── zero_pp_rank_3_mp_rank_00_optim_states.pt ├── latest ├── model-00001-of-00006.safetensors ├── model-00002-of-00006.safetensors ├── model-00003-of-00006.safetensors ├── model-00004-of-00006.safetensors ├── model-00005-of-00006.safetensors ├── model-00006-of-00006.safetensors ├── model.safetensors.index.json ├── rng_state_0.pth ├── rng_state_1.pth ├── rng_state_2.pth ├── rng_state_3.pth ├── special_tokens_map.json ├── tokenizer_config.json ├── tokenizer.model ├── trainer_state.json ├── training_args.bin └── zero_to_fp32.py ``` ### `--report_to wandb` > - wandb -> 訓練腳本會自動將指定的指標和日誌傳送到W&B伺服器 > - 類似 mlflow.autolog 角色 - ### choice ![](https://hackmd.io/_uploads/HJEmW6kkR.png) ``` wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results <--- wandb: Enter your choice: ``` - 等效於 `--report_to none` - ### Q：LLM 訓練參數有 `--report_to wandb`，這是什麼用途？ - ### ChatGPT3.5 `--report_to wandb` 是 LLM（Large Language Model）訓練時的一個參數，用於指定將訓練過程中的指標和日誌報告到 WandB（Weights & Biases）平台的功能。WandB 是一個用於跟踪機器學習實驗的在線平台，它提供了豐富的視覺化工具和實驗管理功能，可以幫助用戶更好地理解模型訓練的過程和結果。具體來說，當設置了 `--report_to wandb` 參數後，LLM 在訓練過程中會將訓練指標（如損失函數、準確率等）、性能指標（如訓練速度、GPU 利用率等）以及其他相關信息報告到 WandB 平台上。這樣做的好處是可以通過 WandB 平台方便地查看、比較和分析不同實驗的結果，以及進行模型監控和數據分析，從而更有效地進行模型調試和優化。 - ### ChatGPT4 `--report_to wandb`參數用於指定在訓練大型語言模型（LLM）時，將訓練過程的指標（如損失、準確率等）報告給Weights & Biases（W&B）平台。 Weights & Biases是一個機器學習實驗追蹤、視覺化和管理的工具，它幫助研究人員和工程師更有效地監控、比較和理解他們的機器學習實驗。使用`--report_to wandb`時，訓練腳本會自動將指定的指標和日誌傳送到W&B伺服器，使用者可以在W&B的儀表板上即時查看和分析實驗結果。這對於追蹤模型效能、調試模型和實驗管理非常有用，尤其是在進行大規模訓練任務時。要使用這個功能，您需要先在W&B上建立帳戶，並安裝Weights & Biases的Python套件。安裝完成後，在訓練腳本執行前登入W&B帳戶，然後透過在訓練指令中加入`--report_to wandb`參數，即可啟動實驗追蹤。您也可以透過其他W&B相關參數來自訂實驗的名稱、專案和其他設定選項。 - ### 相關 issue - [Problem of wandb in train.py #5772](https://github.com/ultralytics/yolov5/issues/5772) ## finetune / resume ### resume 參數 1. 將 `--model_name_or_path` 指到 `checkpoint-*` - 範例 ``` --model_name_or_path ./checkpoints/llava-med-7b-finetune-10k/checkpoint-750 ``` - 資料夾下必須要有 `config.json`，否則會有底下 error ``` OSError: ./checkpoints/llava-med-7b-finetune-10k does not appear to have a file named config.json. ``` - code https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train.py#L966 ![image](https://hackmd.io/_uploads/BysL9meyA.png) - 使用 python 測試 ```python >>> import pathlib >>> output_dir='llava-med-7b-finetune-10k' >>> pathlib.Path(output_dir).glob('checkpoint-*') <generator object Path.glob at 0x7f4c355156d0> >>> list(pathlib.Path(output_dir).glob('checkpoint-*')) [PosixPath('llava-med-7b-finetune-10k/checkpoint-250')] ``` - PASS 2. 移除`--pretrain_mm_mlp_adapter`。如果指定了 `--pretrain_mm_mlp_adapter`，就會跟 `--model_name_or_path` 指到指到多模態模型一樣，會出現底下錯誤： ``` - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). .../python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() [2024-03-26 17:55:30,619] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 686, num_elems = 7.06B Traceback (most recent call last): File ".../LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File ".../LLaVA/llava/train/train.py", line 911, in train model.get_model().initialize_vision_modules( File ".../LLaVA/llava/model/llava_arch.py", line 97, in initialize_vision_modules model.get_model().initialize_vision_modules( self.mm_projector.load_state_dict(get_w(mm_projector_weights, 'mm_projector')) File ".../python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Sequential: size mismatch for 0.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([0]). size mismatch for 0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]). size mismatch for 2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([0]). size mismatch for 2.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]). ``` <hr> ## finetune / final result ### final log ``` {'loss': 0.1963, 'learning_rate': 2e-05, 'epoch': 3.0} {'loss': 0.2883, 'learning_rate': 2e-05, 'epoch': 3.0} {'loss': 0.2143, 'learning_rate': 2e-05, 'epoch': 3.0} {'train_runtime': 199188.1163, 'train_samples_per_second': 0.149, 'train_steps_per_second': 0.037, 'train_loss': 0.39618138930852415, 'epoch': 3.0} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 7413/7413 [55:19:48<00:00, 26.87s/it] [2024-03-31 17:16:58,749] [INFO] [launch.py:347:main] Process 23970 exits successfully. [2024-03-31 17:17:00,752] [INFO] [launch.py:347:main] Process 23969 exits successfully. [2024-03-31 17:17:02,755] [INFO] [launch.py:347:main] Process 23971 exits successfully. [2024-03-31 17:17:42,800] [INFO] [launch.py:347:main] Process 23968 exits successfully. ``` 最後的 loss: 0.2143 <hr> ## finetune / test ### 湖景 https://llava-vl.github.io/static/images/view.jpg [![](https://hackmd.io/_uploads/ryRRlKZAp.png)](https://llava-vl.github.io/static/images/view.jpg) - ### liuhaotian/llava-v1.5-7b ``` (llava) $ python -m llava.serve.cli \ --model-path checkpoints/liuhaotian/llava-v1.5-7b/ \ --image-file https://llava-vl.github.io/static/images/lake_view.jpg \ --load-4bit ``` - **USER**: Can you describe the image in detail? - **ASSISTANT**: The image features a pier extending out over a large body of water, likely a lake. The pier is made of wood and has a few wooden planks. The water is calm, and the pier appears to be empty. In the background, there are mountains visible, adding to the serene and picturesque setting. (該圖像顯示了一個碼頭延伸到一大片水域（可能是湖泊）上。碼頭是木頭做的，有幾塊木板。水面平靜，碼頭顯得空蕩蕩的。背景中可以看到山脈，增添了寧靜和風景如畫的環境。) - **USER**: Can you write a diary about a visit to this place for me? - **ASSISTANT**: :::warning Date: 2023-04-10 Today, I visited a beautiful location that I had been longing to explore. The destination was a pier extending out over a large body of water, likely a lake. The pier was made of wood and had a few wooden planks, giving it a rustic and charming appearance. As I walked along the pier, I was greeted by the calm waters of the lake, which were a striking shade of blue. The mountains in the background provided a stunning backdrop, making the scene even more picturesque. The tranquility of the location was evident, and I could feel the peacefulness of the environment. I spent some time taking in the view, appreciating the natural beauty of the area. As I continued along the pier, I noticed a few birds flying above the water, adding to the serene atmosphere. The pier was empty, and I felt a sense of solitude as I enjoyed the moment. I took a few photographs to capture the memory of this visit, and I am eager to return to this beautiful location again. Overall, the visit to this pier was a delightful experience, and I am grateful for the opportunity to witness such a captivating scene. ::: --- - ### checkpoints/llava-med-7b-finetune-10k-4gpu ``` (llava) $ python -m llava.serve.cli \ --model-path checkpoints/llava-med-7b-finetune-10k-4gpu/ \ --image-file https://llava-vl.github.io/static/images/view.jpg \ --load-4bit ``` - **Human**: Can you describe the image in detail? - **Assistant**: The image is a postoperative day 1 chest X-ray showing the reconstructed right upper lobe of the lung. It displays the thoracic cavity, including the heart, ribcage, and the diaphragm. The main focus of the image is the reconstructed right upper lobe. (這是一張手術後第一天的胸部X光片，展示了重建的右上肺葉。圖片顯示了胸腔，包括心臟、肋骨和膈肌。圖片的主要焦點是重建後的右上肺葉。) (答非所問) ### CT圖 LLaVA-Med/data/pmc_articles/PMC3608000/1749-8090-7-130-4.jpg ![](https://hackmd.io/_uploads/Hy2iGnDJ0.jpg) - ### liuhaotian/llava-v1.5-7b ``` (llava) $ python -m llava.serve.cli \ --model-path checkpoints/liuhaotian/llava-v1.5-7b/ \ --image-file LLaVA-Med/data/pmc_articles/PMC3608000/1749-8090-7-130-4.jpg \ --load-4bit ``` - **USER**: Can you describe the image in detail? - **ASSISTANT**: The image features a close-up view of a human chest with a focus on the lungs and the surrounding area. The lungs are visible in the center of the image, with the right lung occupying a larger portion of the space. The left lung is smaller and positioned towards the left side of the image. (此圖像展示了人體胸部的特寫視圖，並專注於肺部和周圍區域。肺部在影像中心可見，右肺佔據了更大的空間。左肺較小，位於影像的左側。) The chest area is surrounded by a network of blood vessels, which can be seen in various locations throughout the image. Some of these blood vessels are located near the top left corner, while others are scattered around the center and right side of the image. The close-up view of the chest area provides a detailed look at the anatomy of the human body. (胸部區域被血管網絡包圍，在整個影像的不同位置都可以看到。其中一些血管位於左上角附近，而另一些則分散在圖像的中心和右側。胸部區域的特寫視圖提供了人體解剖結構的詳細視圖。) - **USER**: What type of image is this? - **ASSISTANT**: This is a medical image, specifically a chest X-ray, which provides a detailed view of the human chest area, including the lungs and surrounding blood vessels. (這是一張醫學影像，特別是胸部 X 光影像，它提供了人體胸部區域（包括肺部和周圍血管）的詳細視圖。) - **USER**: 這是什麼類型的影像？ - **ASSISTANT**: 這是一張醫學影像，具體來說是胸部X光片，用於視覺胸膜和胸腔內部的結構。 - ### checkpoints/llava-med-7b-finetune-10k-4gpu ``` (llava) $ python python -m llava.serve.cli \ --model-path checkpoints/llava-med-7b-finetune-10k-4gpu/ \ --image-file LLaVA-Med/data/pmc_articles/PMC3608000/1749-8090-7-130-4.jpg \ --load-4bit ``` - **Human**: Can you describe the image in detail? - **Assistant**: The image is a CT scan showing a cross-sectional view of the patient's chest. It highlights the superior mediastinum area, which is the upper part of the mediastinum, the central compartment of the thoracic cavity. (此影像是 CT 掃描，顯示患者胸部的橫斷面圖。它突出顯示上縱隔區域，即縱隔的上部，即胸腔的中央隔室。) - **Human**: What type of image is this? - **Assistant**: This is a CT scan image. - **Human**: 這是什麼類型的影像？ - **Assistant**: Yes, this is a CT scan image. ### MRI圖 https://www.embodi3d.com/blogs/entry/373-how-to-easily-tell-the-difference-between-mri-and-ct-scan/ [![](https://hackmd.io/_uploads/By5DaJdJR.png)](https://content.invisioncic.com/r248939/monthly_2017_04/large.58fd9d3f2994f_Fig2Figure2.jpg.504b237d57c753f3484fdf3908332fd3.jpg) - ### liuhaotian/llava-v1.5-7b ``` (llava) $ python -m llava.serve.cli \ --model-path checkpoints/liuhaotian/llava-v1.5-7b/ \ --image-file LLaVA-Med/data/images/mri_brain.jpg \ --load-4bit ``` - **USER**: Can you describe the image in detail? - **ASSISTANT**: The image is a series of seven close-up shots of a brain, showcasing various parts of the organ. Each section of the brain is displayed in a different photograph, providing a detailed view of its structure and anatomy. The images are arranged in a row, with each section of the brain occupying a separate frame. (該圖像是一系列七個大腦特寫鏡頭，展示了該器官的各個部分。大腦的每個部分都顯示在不同的照片中，提供其結構和解剖結構的詳細視圖。這些圖像排列成一排，大腦的每個部分佔據一個單獨的框架。) - **USER**: What type of image is this? - **ASSISTANT**: This is a medical image, specifically a series of photographs or images of a brain, which are used for educational or research purposes. (這是一張醫學影像，特別是一系列大腦照片或影像，用於教育或研究目的。) - **USER**: What kind of equipment was used to create the image? Chest X-ray, CT or MRI? - **ASSISTANT**: The image was likely created using a CT (computed tomography) scan, as it provides detailed images of the brain and its structures. (該圖像很可能是使用 CT（電腦斷層掃描）掃描創建的，因為它提供了大腦及其結構的詳細圖像。) - **USER**: How do you differentiate between CT and MRI? - **ASSISTANT**: CT and MRI are two different types of medical imaging techniques, each with its own advantages and applications. Here are some key differences between the two: (CT 和 MRI 是兩種不同類型的醫學影像技術，各有其優點和應用。以下是兩者之間的一些主要區別：) 1. **Technology**: CT scans use X-rays to create images, while MRI scans use magnetic fields and radio waves to generate images. (技術：CT 掃描使用 X 光來產生影像，而 MRI 掃描則使用磁場和無線電波來產生影像。) 2. **Imaging Method**: CT scans use X-ray beams to create cross-sectional images of the body, while MRI scans use magnetic fields and radio waves to create images of the body's internal structures. (影像方法：CT 掃描使用 X 射線束創建身體的橫斷面影像，而 MRI 掃描使用磁場和無線電波創建身體內部結構的影像。) 3. **Imaging Resolution**: CT scans can provide high-resolution images, but they may not be as detailed as MRI images, which can capture more intricate details due to the magnetic fields and radio waves used in the imaging process. (成像解析度：CT 掃描可以提供高解析度影像，但可能不如 MRI 影像詳細，由於成像過程中使用的磁場和無線電波，MRI 影像可以捕捉更複雜的細節。) 4. **Safety**: CT scans use ionizing radiation, which can be harmful, while MRI scans are completely non-ionizing and do not use any radiation. (安全性：CT 掃描使用電離輻射，可能有害，而 MRI 掃描完全非電離輻射，不使用任何輻射。) 5. **Applications**: CT scans are commonly used for detecting and diagnosing various conditions, such as fractures, tumors, and infections, while MRI scans are often used for imaging soft tissues, such as the brain, spine, and joints, as well as for detecting and diagnosing certain diseases. (應用：CT 掃描通常用於檢測和診斷各種疾病，例如骨折、腫瘤和感染，而 MRI 掃描通常用於軟組織影像，例如大腦、脊椎和關節，以及檢測和診斷診斷某些疾病。) In summary, CT scans and MRI scans are two different imaging techniques with distinct advantages and applications. CT scans use X-rays to create images and are commonly used for detecting and diagnosing various conditions, while MRI scans use magnetic fields and radio waves to create images and are often used for imaging soft tissues and detecting certain diseases. (綜上所述，CT掃描和MRI掃描是兩種不同的影像技術，具有不同的優點和應用。 CT 掃描使用 X 光創建影像，通常用於檢測和診斷各種病症，而 MRI 掃描使用磁場和無線電波創建影像，通常用於軟組織成像和檢測某些疾病) - ### checkpoints/llava-med-7b-finetune-10k-4gpu ``` (llava) $ python python -m llava.serve.cli \ --model-path checkpoints/llava-med-7b-finetune-10k-4gpu/ \ --image-file LLaVA-Med/data/images/mri_brain.jpg \ --load-4bit ``` - **Human**: Can you describe the image in detail? - **Assistant**: The image consists of three axial MRI scans arranged horizontally, showing the progression of a lesion over time. The scans are taken using a 3-T General Electric Sigma unit. (此影像由水平排列的三個軸向 MRI 掃描組成，顯示病灶隨時間的進展。使用 3-T General Electric Sigma 裝置進行掃描。) - **Human**: What type of image is this? - **Assistant**: This is a type of magnetic resonance imaging (MRI) image. (這是一種磁振造影 (MRI) 影像。) - **Human**: What kind of equipment was used to create the image? Chest X-ray, CT or MRI? - **Assistant**: The equipment used to create the image is a 3-T General Electric Sigma unit. (用於建立影像的裝置是 3-T 通用電氣 Sigma 裝置。) - **Human**: How do you differentiate between CT and MRI? - **Assistant**: In the MRI images, the lesion appears as an area of abnormal signal intensity, which is typically characterized by a bright or hyperintense signal in the affected region. This is in contrast to the normal appearance of tissues in a CT scan, where the lesion would appear as a darker or less intense signal. (在 MRI 影像中，病灶表現為訊號強度異常的區域，其典型特徵是受影響區域出現明亮或高訊號。這與 CT 掃描中組織的正常外觀形成鮮明對比，在 CT 掃描中，病灶會顯示為較暗或較弱的訊號。) ### cell wall 17096591_pcbi-0020151-g003 [![](https://hackmd.io/_uploads/rkX2gb_y0.jpg)](https://hackmd.io/_uploads/rkX2gb_y0.jpg) - ### 出處：[[PMC1635540] A Curvature-Mediated Mechanism for Localization of Lipids to Bacterial Poles](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635540/) - ### 這圖提供了什麼醫學資訊？ [ChatGPT4] 這幅圖來自於一篇發表在《PLOS 計算生物學》的學術論文，題為「細菌極點脂質定位的曲率介導機制」。圖中展示的是脂質團簇，特別是心臟磷脂，在細菌細胞的極點和分裂部位的定位原理，這一過程是由細胞膜的曲率和存在於細菌細胞壁中的內在曲率介導的。論文中討論了一個模型，說明了由於細胞幾何形狀和脂質的內在曲率，細菌細胞中的脂質如何自發性地分離。作者們提出，微相分離的脂質可以解釋在細菌中觀察到的心臟磷脂的極點定位。這一過程對於理解細菌細胞如何維持其形狀、促進正確分裂，以及可能如何將特定蛋白質定位到細胞極點至關重要。具體到您提供圖片中標記為A、B、C的面板，作者們似乎在演示在不同短程吸引強度ε下，模擬細菌膜上脂質的分布和團簇。面板D看起來像是展示細菌膜一段在三維空間中的模擬，彩色區域代表心臟磷脂等高曲率脂質可能聚集的區域。從更廣泛的醫學背景來看，理解這些過程對於洞悉細菌細胞生物學的基礎運作至關重要，這可能會對抗生素和治療細菌感染的治療方法的開發產生影響。細菌如何維持其形狀和極性對它們的生存和致病性是必不可少的，因此這類研究可以為抗微生物藥物的開發貢獻力量。 - ### 這要幹麻？ [ChatGPT4][推測用途] 這可能是在研究細胞膜上脂質排列的模式以及它們如何影響細胞功能。這種分析常用於了解細胞膜結構和動態，對於發展新的藥物和療法可能非常重要。 - ### pretrain data - ### `data/llava_med_image_urls.jsonl` `{"pmc_tar_url": "https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/a1/30/PMC1635540.tar.gz", "image_file_path": "PMC1635540/pcbi.0020151.g003.jpg", "pair_id": "17096591_pcbi-0020151-g003"}` - ### `data/alignment/llava_med_alignment_500k.json` ```json { "id": "17096591_pcbi-0020151-g003", "image": "17096591_pcbi-0020151-g003.jpg", "conversatons": [ { "from": "human", "value": "Illustrate the image through a descriptive explanation\n<image>" }, { "from": "gpt", "value": "Polar Localization of Lipid Clusters Elastic energy parameters are fixed (\u03ba = 25 k = 0.25 k/nm4, \u03b3 = 0.4 nm\u22121), with varying short-range attraction (A) \u025b = 1 k (B) \u025b = 1.5 k and (C) \u025b = 2.5 k. The rectangles on the left and right represent the cell poles and have slightly enhanced cell-wall curvature, \u03b3 pole = 0.04 nm\u22121. The three-dimensional figure in (D) represents the lipid position and the membrane height in (C) mapped onto the surface of a capped cylinder. The color scheme and fraction of lipid A (\u03d5 = 0.075) are the same as in Figure 2." } ] }, ``` > 極區位置的脂質聚集體 > 彈性能量參數固定（κ = 25 k = 0.25 k/nm^4, γ = 0.4 nm^-1），而短程吸引力（A）ε = 1 k（B）ε = 1.5 k 和（C）ε = 2.5 k 則變化。左右兩邊的矩形代表細胞極點，並且有輕微增強的細胞壁曲率，γ極 = 0.04 nm^-1。三維圖形（D）代表在（C）中映射到帶蓋圓柱體表面的脂質位置和膜高度。顏色方案和脂質A的比例（φ = 0.075）與圖2中的相同。 - ### liuhaotian/llava-v1.5-7b - **USER**: Can you describe the image in detail? - **ASSISTANT**: The image features a series of graphs and diagrams illustrating the temperature of a cell. There are three graphs displayed, each showing different stages of the cell's temperature. The graphs are placed next to each other, with one graph on the left side, another in the middle, and the third on the right side of the image. (圖像展示了一系列圖表和圖解，說明了細胞的溫度。顯示了三個圖表，每個都顯示了細胞溫度的不同階段。這些圖表並排放置，左側一個，中間一個，右側一個。) In addition to the graphs, there is a picture of a cell on the right side of the image, which is likely related to the temperature diagrams. The image provides a visual representation of the temperature changes within the cell, making it easier to understand and analyze the data. (除了圖表外，圖像右側還有一個細胞的圖片，這可能與溫度圖表相關。該圖像提供了細胞內溫度變化的視覺表現，使數據的理解和分析更為容易。) - ### checkpoints/llava-med-7b-finetune-10k-4gpu - **Human**: Can you describe the image in detail? - **Assistant**: The image consists of four sub-figures, labeled (a), (b), (c), and (d). Sub-figure (a) shows the increasing blood wall temperature as an abrupt increase in blood flow, while sub-figures (b), (c), and (d) display the effect of increasing blood flow on the blood wall, with sub-figure (b) showing no cell infiltration and sub-figures (c) and (d) showing various cell infiltration. (圖像由四個子圖組成，分別標記為（a）、（b）、（c）和（d）。子圖（a）顯示了血流突然增加時血管壁溫度的升高，而子圖（b）、（c）和（d）展示了血流增加對血管壁的影響，其中子圖（b）顯示沒有細胞浸潤，子圖（c）和（d）顯示了不同程度的細胞浸潤。) <hr> ## Errors: ### 找不到對應的圖片 - FileNotFoundError: [Errno 2] No such file or directory - 故意把 `32357477_ijms-21-03049-f006.jpg` 改成 `32357477_ijms-21-03049-f006.png`，來確認 error 的實際訊息，順便觀察 code 的執行流程 ``` Traceback (most recent call last): File "~/tj_tsai/workspace/llm/llava-med/LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File "~/diatango/LLM/LLaVA/llava/train/train.py", line 974, in train Traceback (most recent call last): File "~/tj_tsai/workspace/llm/llava-med/LLaVA-Med/llava/train/train_xformers.py", line 13, in <module> train() File "~/diatango/LLM/LLaVA/llava/train/train.py", line 974, in train trainer.train() File ".../python3.10/site-packages/transformers/trainer.py", line 1539, in train trainer.train() File ".../python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File ".../python3.10/site-packages/transformers/trainer.py", line 1837, in _inner_training_loop return inner_training_loop( File ".../python3.10/site-packages/transformers/trainer.py", line 1837, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File ".../python3.10/site-packages/accelerate/data_loader.py", line 384, in __iter__ for step, inputs in enumerate(epoch_iterator): File ".../python3.10/site-packages/accelerate/data_loader.py", line 384, in __iter__ current_batch = next(dataloader_iter) File ".../python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__ current_batch = next(dataloader_iter) File ".../python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__ data = self._next_data() File ".../python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data data = self._next_data() File ".../python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File ".../python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File ".../python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp> data = self._dataset_fetcher.fetch(index) # may raise StopIteration File ".../python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "~/diatango/LLM/LLaVA/llava/train/train.py", line 701, in __getitem__ data = [self.dataset[idx] for idx in possibly_batched_index] File ".../python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "~/diatango/LLM/LLaVA/llava/train/train.py", line 701, in __getitem__ image = Image.open(os.path.join(image_folder, image_file)).convert('RGB') File ".../python3.10/site-packages/PIL/Image.py", line 3247, in open image = Image.open(os.path.join(image_folder, image_file)).convert('RGB') File ".../python3.10/site-packages/PIL/Image.py", line 3247, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: '~/tj_tsai/workspace/llm/llava-med/LLaVA-Med/data/images-pretrain-500k/32357477_ijms-21-03049-f006.jpg' fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: '~/tj_tsai/workspace/llm/llava-med/LLaVA-Med/data/images-pretrain-500k/32357477_ijms-21-03049-f006.jpg' ``` ### stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use zero_to_fp32.py to recover weights > 只是 warning - project - `transformers==4.37.2` - [transformers/tags](https://github.com/huggingface/transformers/tags) ![](https://hackmd.io/_uploads/H1mi23J1A.png) - [[code] transformers/src/transformers/trainer.py](https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/trainer.py#L2869) ![](https://hackmd.io/_uploads/SyIJhjyyA.png) ### `do_sample` is set to `False` ``` stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use zero_to_fp32.py to recover weights Traceback (most recent call last): File ".../python3.10/site-packages/transformers/generation/configuration_utils.py", line 558, in save_pretrained raise ValueError(str([w.message for w in caught_warnings])) ValueError: [UserWarning('`do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.'), UserWarning('`do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.')] ``` - project - `transformers==4.37.2` - [In meta-llama/Llama-2-7b-hf i got issue after staring fine tunning is that ValueError: do_sample is set to False. However, temperature is set to 0.9 -- this flag is only used in sample-based generation modes. Set do_sample=True or unset temperature to continue. #649](https://github.com/meta-llama/llama/issues/649) - For anyone looking for a solution, it was an issue with the latest release of hugging face transformers released recently. Please downgrade to the previous version !pip install `git+https://github.com/huggingface/transformers@v4.31-release` to fix the issue. - ~~如果降版：~~ `pip install git+https://github.com/huggingface/transformers@v4.31-release` - error: got an unexpected keyword argument 'attn_implementation' ``` TypeError: LlavaLlamaForCausalLM.__init__() got an unexpected keyword argument 'attn_implementation' model = cls(config, *model_args, **model_kwargs) File ".../python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 459, in wrapper f(module, *args, **kwargs) TypeError: LlavaLlamaForCausalLM.__init__() got an unexpected keyword argument 'attn_implementation' ``` - ~~如果升版：~~ `pip install git+https://github.com/huggingface/transformers@v4.39.1` - error: got an unexpected keyword argument 'use_seedable_sampler' ``` File ".../python3.10/site-packages/transformers/trainer.py", line 4252, in create_accelerator_and_postprocess self.accelerator = Accelerator( TypeError: Accelerator.__init__() got an unexpected keyword argument 'use_seedable_sampler' ``` - **正解：`do_sample: true`** [[Usage] ValueError: The generation config instance is invalid #1144](https://github.com/haotian-liu/LLaVA/issues/1144) [![](https://hackmd.io/_uploads/S1K_Hexy0.png)](https://github.com/haotian-liu/LLaVA/issues/1144#issuecomment-1950979394) - 修改方式 1. `--model_name_or_path checkpoints/lmsys/vicuna-7b-v1.5` 2. 編輯 model 目錄下的 generation_config.json ![](https://hackmd.io/_uploads/SJ1JUglJR.png) 新增：`"do_sample": true` <hr> ## pretrain vs finetune ### 參數差異 - **左邊**：[LLaVA/scripts/v1_5/pretrain.sh](https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/pretrain.sh) - **右邊**：[LLaVA/scripts/v1_5/finetune.sh](https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune.sh) [![](https://hackmd.io/_uploads/SkLI_wLeC.png)](https://hackmd.io/_uploads/SkLI_wLeC.png) ### 產物差異 ``` checkpoints/ ├── [4.0K] llava-v1.5-7b-finetune │ ├── [1.2K] config.json │ ├── [ 229] generation_config.json │ ├── [4.6G] model-00001-of-00003.safetensors │ ├── [4.6G] model-00002-of-00003.safetensors │ ├── [3.9G] model-00003-of-00003.safetensors │ ├── [ 71K] model.safetensors.index.json │ ├── [ 438] special_tokens_map.json │ ├── [ 936] tokenizer_config.json │ ├── [488K] tokenizer.model │ ├── [ 892] trainer_state.json │ └── [7.2K] training_args.bin ├── [4.0K] llava-v1.5-7b-finetune+freeze=true │ ├── [1.2K] config.json │ ├── [ 229] generation_config.json │ ├── [4.6G] model-00001-of-00003.safetensors │ ├── [4.6G] model-00002-of-00003.safetensors │ ├── [3.9G] model-00003-of-00003.safetensors │ ├── [ 71K] model.safetensors.index.json │ ├── [ 438] special_tokens_map.json │ ├── [ 936] tokenizer_config.json │ ├── [488K] tokenizer.model │ ├── [ 892] trainer_state.json │ └── [7.3K] training_args.bin ├── [4.0K] llava-v1.5-7b-finetune+tune_mm_mlp_adapter=true │ ├── [1.2K] config.json │ ├── [ 40M] mm_projector.bin │ └── [ 893] trainer_state.json ├── [4.0K] llava-v1.5-7b-finetune+tune_mm_mlp_adapter=true+freeze=true │ ├── [7.6K] error__with_zero3_offload.log │ └── [6.4K] error__with_zero3.txt └── [4.0K] llava-v1.5-7b-pretrain ├── [1.2K] config.json ├── [ 40M] mm_projector.bin └── [ 798] trainer_state.json ``` <hr> ## code - [transformers/src/transformers/trainer.py](https://github.com/huggingface/transformers/blob/dacd34568d1a27b91f84610eab526640ed8f94e0/src/transformers/trainer.py#L2801C14-L2801C21) <hr> ## Cynthia's Notes - 另外注意資料夾的名稱，請務必包含 llava 另外確認下檔名如果副檔名是 bin，檔名必須符合 `pytorch_model-00001-of-*.bin` 格式如果副檔名是 safetensors，檔名必須符合 `model-00001-of-*.safetensors` 格式不然在 load_pretrained_model 會走到奇怪的地方去 - 如果你是用程式中的 eval_model 來測試，這個共你參考 ```python= from llava.mm_utils import get_model_name_from_path from llava.eval.run_llava import eval_model model_base = "~/cynthia/LLaVA//checkpoints/vicuna-7b-v1.5" model_path = "~/cynthia/LLaVA/checkpoints/llava-v1.5-7b-output" prompt = "What are the things I should be cautious about when I visit here?" image_file = "https://llava-vl.github.io/static/images/view.jpg" args = type('Args', (), { "model_path": model_path, "model_base": model_base, "model_name": get_model_name_from_path(model_path), "query": prompt, "conv_mode": None, "image_file": image_file, "sep": ",", "temperature": 0, "top_p": None, "num_beams": 1, "max_new_tokens": 512 })() eval_model(args) ``` <hr> ## Q & A ### [issue] accuracy > 無法重製出論文的效果 - [Confusion about the accuracy of the model #28](https://github.com/microsoft/LLaVA-Med/issues/28) ### [issue] projector - [Projector weight for model #34](https://github.com/microsoft/LLaVA-Med/issues/34) - [How to get "pretrain_mm_mlp_adapter" weights, which are aligned on medical concepts? #56](https://github.com/microsoft/LLaVA-Med/issues/56) ### torchrun vs deepspeed，兩者有何差別？ - ### [ChatGPT3.5][context: GPU] torchrun和deepspeed都是用於分佈式訓練和優化訓練性能的工具，但它們有一些重要的區別： #### 功能和用途： - **torchrun**：torchrun是由Facebook AI Research開發的工具，旨在簡化PyTorch模型的訓練和佈署。它提供了用於分佈式訓練、模型佈署和模型服務化的一系列命令和工具。 - **deepspeed**：deepspeed是微軟開發的一個用於分佈式深度學習訓練的庫。它提供了更多的高級功能，如動態為模型添加梯度累積、分佈式優化器、自動精簡和模型佈署等。 #### 開發團隊： - **torchrun**：由Facebook AI Research團隊開發和維護。 - **deepspeed**：由微軟開發和維護。 #### 支援的功能和平台： - **torchrun**：支援的功能比較基本，主要針對PyTorch模型的訓練和佈署。 - **deepspeed**：提供了更多的高級功能，包括深度學習訓練的各個方面，並支援在各種硬體和分佈式設置上運行。總的來說，deepspeed提供了更豐富的功能和更高的靈活性，但也需要更多的配置和設置。相比之下，torchrun更加簡潔和易於使用，適合那些對分佈式訓練有一些基本需求的使用者。选择哪個工具取決於你的需求和技術水平，以及你的訓練和部署環境。 - ### [ChatGPT3.5][context: LLM] torchrun 和 deepspeed 是兩種用於加速 PyTorch 模型訓練的工具，但它們在設計和功能上有一些差異： 1. **設計目標：** - **torchrun**：torchrun 旨在提供一個簡潔、易於使用的工具，用於在單個節點上加速 PyTorch 模型訓練。它專注於提供輕量級的解決方案，適用於中小型模型和小規模的訓練任務。 - **deepspeed**：deepspeed 設計用於處理更大型的模型和更大型的訓練任務。它提供了許多高級功能，例如模型佈局優化、分佈式訓練、混合精度訓練等，旨在實現大規模訓練的高效性和可擴展性。 2. **功能特性：** - **torchrun**：torchrun 提供了基本的模型訓練功能，包括單節點的訓練、模型佈局和配置設置等。它的功能較為基本，適用於相對簡單的訓練需求。 - **deepspeed**：deepspeed 提供了許多進階功能，如 ZeRO 優化、Gradient Accumulation、動態圖調整等。它還支持分佈式訓練，可以在多個節點上進行大規模訓練。 3. **用法和配置：** - **torchrun**：torchrun 的用法相對簡單，設置和配置較為直觀，通常需要較少的代碼和配置文件。 - **deepspeed**：deepspeed 的配置較為複雜，需要更多的配置文件和參數來設置各種高級功能和分佈式訓練環境。總的來說，torchrun 適用於簡單的單節點訓練任務，而 deepspeed 更適用於處理較大規模的訓練任務，並提供了更多的進階功能和配置選項。 - ### [ChatGPT4][context: LLM] `torchrun`（之前稱為`torch.distributed.launch`）是PyTorch的一個工具，用於啟動分佈式訓練。它支援多進程訓練，其中每個進程操作一個GPU。torchrun在PyTorch生態系統中提供了一個簡單的方式來啟動分佈式訓練作業，並內建了對PyTorch分佈式包的支援，不需要額外安裝。 `DeepSpeed` 則是微軟推出的一個深度學習優化框架，專為大規模和分佈式訓練設計。它提供了一系列性能優化功能，如模型並行性、內存優化以及資源效率的改進。DeepSpeed旨在簡化大型模型訓練並加速訓練過程，使研究人員和工程師可以更容易地擴展他們的模型和數據。主要差異點包括： 1. **易用性：** - `torchrun` 作為PyTorch的一部分，對用戶來說使用起來比較直觀，尤其是對於熟悉PyTorch的用戶。 - `DeepSpeed` 雖然有其專屬的設定，但它提供了更多的性能優化功能和大規模訓練支援。 2. **功能範圍：** - `torchrun` 要提供分佈式訓練的基礎功能。 - `DeepSpeed` 提供更廣泛的功能，包括但不限於ZeRO優化、動態稀疏訓練和高效數據加載。 3. **性能優化：** - `torchrun` 依賴於標準的PyTorch分佈式組件。 - `DeepSpeed` 則具有更多先進的優化技術，可大幅度提高模型訓練的速度和規模。 4. **大型模型支援：** - `torchrun` 可以用於各種規模的模型，但對於特別大的模型可能需要額外的設定。 - `DeepSpeed` 專門優化了對於非常大的模型訓練的支援，可以處理數十億甚至數萬億參數的模型。綜上所述，DeepSpeed提供了更多的功能和優化，適合那些需要在多GPU或多節點上進行大規模訓練的場景。而torchrun則適用於標準的分佈式訓練需求。選擇哪個工具，主要取決於您的具體需求和環境。