[Traceback] vLLM / tokenizer
===
###### tags: `LLM / tokenizer`
###### tags: `ML`, `NLP`, `NLU`, `LLM`, `tokenizer`, `vLLM`
<br>
[TOC]
<br>
## vllm 相依套件
```
vllm==0.4.2
├── ...
├── openai [required: Any, installed: 1.25.1]
├── sentencepiece [required: Any, installed: 0.2.0]
├── ...
├── ...
├── tiktoken [required: ==0.6.0, installed: 0.6.0]
│ ├── regex [required: >=2022.1.18, installed: 2024.4.28]
│ └── requests [required: >=2.26.0, installed: 2.31.0]
│ ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ ├── idna [required: >=2.5,<4, installed: 3.7]
│ └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
├── tokenizers [required: >=0.19.1, installed: 0.19.1] <----------------
│ └── huggingface-hub [required: >=0.16.4,<1.0, installed: 0.23.0]
│ ├── filelock [required: Any, installed: 3.14.0]
│ ├── fsspec [required: >=2023.5.0, installed: 2024.2.0]
│ ├── packaging [required: >=20.9, installed: 24.0]
│ ├── PyYAML [required: >=5.1, installed: 6.0.1]
│ ├── requests [required: Any, installed: 2.31.0]
│ │ ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ ├── idna [required: >=2.5,<4, installed: 3.7]
│ │ └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│ ├── tqdm [required: >=4.42.1, installed: 4.66.4]
│ └── typing_extensions [required: >=3.7.4.3, installed: 4.11.0]
├── torch [required: ==2.3.0, installed: 2.3.0]
├── transformers [required: >=4.40.0, installed: 4.40.1] <--------------
│ ├── filelock [required: Any, installed: 3.14.0]
│ ├── huggingface-hub [required: >=0.19.3,<1.0, installed: 0.23.0]
│ │ ├── filelock [required: Any, installed: 3.14.0]
│ │ ├── fsspec [required: >=2023.5.0, installed: 2024.2.0]
│ │ ├── packaging [required: >=20.9, installed: 24.0]
│ │ ├── PyYAML [required: >=5.1, installed: 6.0.1]
│ │ ├── requests [required: Any, installed: 2.31.0]
│ │ │ ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│ │ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ │ ├── idna [required: >=2.5,<4, installed: 3.7]
│ │ │ └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│ │ ├── tqdm [required: >=4.42.1, installed: 4.66.4]
│ │ └── typing_extensions [required: >=3.7.4.3, installed: 4.11.0]
│ ├── numpy [required: >=1.17, installed: 1.26.4]
│ ├── packaging [required: >=20.0, installed: 24.0]
│ ├── PyYAML [required: >=5.1, installed: 6.0.1]
│ ├── regex [required: !=2019.12.17, installed: 2024.4.28]
│ ├── requests [required: Any, installed: 2.31.0]
│ │ ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ ├── idna [required: >=2.5,<4, installed: 3.7]
│ │ └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│ ├── safetensors [required: >=0.4.1, installed: 0.4.3]
│ ├── tokenizers [required: >=0.19,<0.20, installed: 0.19.1] <-------
│ │ └── huggingface-hub [required: >=0.16.4,<1.0, installed: 0.23.0]
│ │ ├── filelock [required: Any, installed: 3.14.0]
│ │ ├── fsspec [required: >=2023.5.0, installed: 2024.2.0]
│ │ ├── packaging [required: >=20.9, installed: 24.0]
│ │ ├── PyYAML [required: >=5.1, installed: 6.0.1]
│ │ ├── requests [required: Any, installed: 2.31.0]
│ │ │ ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│ │ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ │ ├── idna [required: >=2.5,<4, installed: 3.7]
│ │ │ └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│ │ ├── tqdm [required: >=4.42.1, installed: 4.66.4]
│ │ └── typing_extensions [required: >=3.7.4.3, installed: 4.11.0]
│ └── tqdm [required: >=4.27, installed: 4.66.4]
└── xformers [required: ==0.0.26.post1, installed: 0.0.26.post1]
```
<br>
<hr>
<hr>
<br>
## [[組態]]
## tokenizer
### 與 tokenizer 相關的套件
> vllm==0.4.2
- [[github] huggingface / transformers](https://github.com/huggingface/transformers)
- vLLM 使用版本:transformers==4.40.1
- [[github] huggingface / tokenizers](https://github.com/huggingface/tokenizers/)
- vLLM 使用版本:tokenizers==0.19.1 (latest)
<br>
### 與 tokenizer 相關的組態檔案
```
$ find -name '*.py' -exec grep -Hn '\.json' {} \;| grep -i token
```
- #### `.../dist-packages/modelscope/preprocessors/nlp/text_generation_preprocessor.py`:
> https://github.com/modelscope/modelscope/blob/master/modelscope/preprocessors/nlp/text_generation_preprocessor.py#L219
> 備註:vllm, transformers, tokenizers 及其全部子套件,皆沒有依賴此套件,此套件亦沒有被其他套件使用,不清楚為何會安裝此套件?
```
#219: osp.join(model_dir, 'tokenizer.json'))
```
- #### `.../dist-packages/modelscope/preprocessors/nlp/transformers_tokenizer.py`:
> https://github.com/modelscope/modelscope/blob/master/modelscope/preprocessors/nlp/transformers_tokenizer.py#L57
> 備註:vllm, transformers, tokenizers 及其全部子套件,皆沒有依賴此套件,此套件亦沒有被其他套件使用,不清楚為何會安裝此套件?
```
#57: os.path.join(self.model_dir, 'tokenizer_config.json')):
#59: os.path.join(self.model_dir, 'tokenizer_config.json'),
```
- #### `.../dist-packages/transformers/tokenization_utils_base.py`:
> https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py
```
#130:SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json"
#131:ADDED_TOKENS_FILE = "added_tokens.json"
#132:TOKENIZER_CONFIG_FILE = "tokenizer_config.json"
#135:FULL_TOKENIZER_FILE = "tokenizer.json"
#136:_re_tokenizer_file = re.compile(r"tokenizer\.(.*)\.json")
#2118: # We instantiate fast tokenizers based on a slow tokenizer if we don't have access to the tokenizer.json
#2266: # slow -> slow|fast, legacy: convert the `"added_tokens.json"` file to `added_tokens_decoder`.
#2287: # allows converting a fast -> slow: add the `tokenizer.json`'s `"added_tokens"` to the slow tokenizer
#2288: # if `tokenizer_config.json` is `None`
```
- #### `.../dist-packages/transformers/tokenization_utils_fast.py`:
> https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_fast.py#L54
```
#51:TOKENIZER_FILE = "tokenizer.json"
#52:SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json"
#53:TOKENIZER_CONFIG_FILE = "tokenizer_config.json"
#56:ADDED_TOKENS_FILE = "added_tokens.json"
#159: # allows converting a slow -> fast, non-legacy: if the `tokenizer.json` does not have all the added tokens
```
- #### `.../dist-packages/transformers/tokenization_utils.py`:
> https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils.py#L48
```
#47:SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json"
#48:ADDED_TOKENS_FILE = "added_tokens.json"
#49:TOKENIZER_CONFIG_FILE = "tokenizer_config.json"
```
<br>
<hr>
<hr>
<br>
## [[code]]
## 載入程式碼的 Traceback
Traceback (most recent call last):
```
File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
__import__(pkg_name)
File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 6, in <module>
from vllm.engine.async_llm_engine import AsyncLLMEngine
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 12, in <module>
from vllm.engine.llm_engine import LLMEngine
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 14, in <module>
from vllm.engine.output_processor.interfaces import (
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/interfaces.py", line 10, in <module>
from vllm.transformers_utils.detokenizer import Detokenizer
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/detokenizer.py", line 193
```
- `__init__.py`#6
- engine/async_llm_engine.py#12
- engine/llm_engine.py#14
- output_processor/interfaces.py#10
- transformers_utils/detokenizer.py
<br>
## 處理 request
### detokenizer API 呼叫過程
[vllm/transformers_utils/detokenizer.py](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py)
- [decode_sequence_inplace()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L89)
- [convert_prompt_ids_to_tokens()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L199)
1. [detokenize_incrementally()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L224)
2. decode_sequence_inplace()
3. 重複 1 & 2
<br>
### 處理 request 的 Traceback
```
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File ".../vllm/entrypoints/openai/api_server.py", line 107, in create_completion
generator = await openai_serving_completion.create_completion(
File ".../vllm/entrypoints/openai/serving_completion.py", line 153, in create_completion
async for i, res in result_generator:
File ".../vllm/utils.py", line 228, in consumer
raise item
File ".../vllm/utils.py", line 213, in producer
async for item in iterator:
File ".../vllm/engine/async_llm_engine.py", line 661, in generate
raise e
File ".../vllm/engine/async_llm_engine.py", line 655, in generate
async for request_output in stream:
File ".../vllm/engine/async_llm_engine.py", line 77, in __anext__
raise result
File ".../vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File ".../vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File ".../vllm/engine/async_llm_engine.py", line 470, in engine_step
request_outputs = await self.engine.step_async()
File ".../vllm/engine/async_llm_engine.py", line 220, in step_async
request_outputs = self._process_model_outputs(
File ".../vllm/engine/llm_engine.py", line 489, in _process_model_outputs
self.output_processor.process_outputs(seq_group, outputs)
File ".../vllm/engine/output_processor/single_step.py", line 56, in process_outputs
return self._process_sequence_group_outputs(sequence_group, outputs[0])
File ".../vllm/engine/output_processor/single_step.py", line 111, in _process_sequence_group_outputs
new_char_count = self.detokenizer.decode_sequence_inplace(
File ".../vllm/transformers_utils/detokenizer.py", line 121, in decode_sequence_inplace
read_offset) = detokenize_incrementally(
File ".../vllm/transformers_utils/detokenizer.py", line 224, in detokenize_incrementally
```
<br>
## [detokenizer.py / convert_prompt_ids_to_tokens()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L199)
- **測試 prompt**
`<|begin_of_text|><|start_header_id|>user<|end_header_id|>今天天氣如何?<|eot_id|><|start_header_id|>assistant<|end_header_id|>`
- **執行追蹤**
- prompt_ids = [128000, 128006, 882, 128007, 110916, 36827, 107895, 109425, 11571, 128009, 128006, 78191, 128007]
- mapping
- 128000 = `<|begin_of_text|>`
- 128006 = `<|start_header_id|>`
- 882 = `user`
- 128007 = `<|end_header_id|>`
- 110916 = `今天`
- 36827 = `天`
- 107895 = `氣`
- 109425 = `如何`
- 11571 = `?`
- 128009 = `<|eot_id|>`
- 128006 = `<|start_header_id|>`
- 78191 = `assistant`
- 128007 = `<|end_header_id|>`
- `prompt_ids[-INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET - 2:]`
= `prompt_ids[- 5 - 2:]`
= `prompt_ids[-7:]`
(取倒數 7 個 token IDs,用途為何?)
- **變數**
- INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET = 5
- skip_special_tokens = True
- new_tokens = ['æ°£', 'å¦Ĥä½ķ', 'ï¼Ł', '<|eot_id|>', '<|start_header_id|>', 'assistant', '<|end_header_id|>']
- 就是 `['氣', '如何', '?', '<|eot_id|>', '<|start_header_id|>', 'assistant', '<|end_header_id|>']`
- prefix_offset = 2
- read_offset = 7
- **備註**
- 如果 new_tokens 存在一個 special token,結果會是:
- prefix_offset = 1
- read_offset = 6
- 因為 skip_special_tokens=True 會忽略掉該 special token
- 比如:
```
<|start_header_id|><|begin_of_text|>assistant<|end_header_id|>
```
- 因為 **`<|begin_of_text|>`** 是「真的」special token