---
# System prepended metadata

title: '[Traceback] vLLM / tokenizer'
tags: [NLU, LLM / tokenizer, NLP, LLM, ML, tokenizer, vLLM]

---

[Traceback] vLLM / tokenizer
===
###### tags: `LLM / tokenizer`
###### tags: `ML`, `NLP`, `NLU`, `LLM`, `tokenizer`, `vLLM`

<br>

[TOC]

<br>

## vllm 相依套件
```
vllm==0.4.2
├── ...
├── openai [required: Any, installed: 1.25.1]
├── sentencepiece [required: Any, installed: 0.2.0]
├── ...
├── ...
├── tiktoken [required: ==0.6.0, installed: 0.6.0]
│   ├── regex [required: >=2022.1.18, installed: 2024.4.28]
│   └── requests [required: >=2.26.0, installed: 2.31.0]
│       ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│       ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│       ├── idna [required: >=2.5,<4, installed: 3.7]
│       └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
├── tokenizers [required: >=0.19.1, installed: 0.19.1]  <----------------
│   └── huggingface-hub [required: >=0.16.4,<1.0, installed: 0.23.0]
│       ├── filelock [required: Any, installed: 3.14.0]
│       ├── fsspec [required: >=2023.5.0, installed: 2024.2.0]
│       ├── packaging [required: >=20.9, installed: 24.0]
│       ├── PyYAML [required: >=5.1, installed: 6.0.1]
│       ├── requests [required: Any, installed: 2.31.0]
│       │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│       │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│       │   ├── idna [required: >=2.5,<4, installed: 3.7]
│       │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│       ├── tqdm [required: >=4.42.1, installed: 4.66.4]
│       └── typing_extensions [required: >=3.7.4.3, installed: 4.11.0]
├── torch [required: ==2.3.0, installed: 2.3.0]
├── transformers [required: >=4.40.0, installed: 4.40.1]  <--------------
│   ├── filelock [required: Any, installed: 3.14.0]
│   ├── huggingface-hub [required: >=0.19.3,<1.0, installed: 0.23.0]
│   │   ├── filelock [required: Any, installed: 3.14.0]
│   │   ├── fsspec [required: >=2023.5.0, installed: 2024.2.0]
│   │   ├── packaging [required: >=20.9, installed: 24.0]
│   │   ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   │   ├── requests [required: Any, installed: 2.31.0]
│   │   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   │   ├── tqdm [required: >=4.42.1, installed: 4.66.4]
│   │   └── typing_extensions [required: >=3.7.4.3, installed: 4.11.0]
│   ├── numpy [required: >=1.17, installed: 1.26.4]
│   ├── packaging [required: >=20.0, installed: 24.0]
│   ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   ├── regex [required: !=2019.12.17, installed: 2024.4.28]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── safetensors [required: >=0.4.1, installed: 0.4.3]
│   ├── tokenizers [required: >=0.19,<0.20, installed: 0.19.1]   <-------
│   │   └── huggingface-hub [required: >=0.16.4,<1.0, installed: 0.23.0]
│   │       ├── filelock [required: Any, installed: 3.14.0]
│   │       ├── fsspec [required: >=2023.5.0, installed: 2024.2.0]
│   │       ├── packaging [required: >=20.9, installed: 24.0]
│   │       ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   │       ├── requests [required: Any, installed: 2.31.0]
│   │       │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │       │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │       │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │       │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   │       ├── tqdm [required: >=4.42.1, installed: 4.66.4]
│   │       └── typing_extensions [required: >=3.7.4.3, installed: 4.11.0]
│   └── tqdm [required: >=4.27, installed: 4.66.4]
└── xformers [required: ==0.0.26.post1, installed: 0.0.26.post1]
```

<br>

<hr>
<hr>

<br>

## [[組態]]

## tokenizer

### 與 tokenizer 相關的套件
> vllm==0.4.2

- [[github] huggingface / transformers](https://github.com/huggingface/transformers)
    - vLLM 使用版本：transformers==4.40.1
- [[github] huggingface / tokenizers](https://github.com/huggingface/tokenizers/)
    - vLLM 使用版本：tokenizers==0.19.1 (latest)

<br>

### 與 tokenizer 相關的組態檔案
```
$ find  -name '*.py' -exec grep -Hn '\.json' {} \;| grep -i token
```
- #### `.../dist-packages/modelscope/preprocessors/nlp/text_generation_preprocessor.py`:
    > https://github.com/modelscope/modelscope/blob/master/modelscope/preprocessors/nlp/text_generation_preprocessor.py#L219
    
    > 備註：vllm, transformers, tokenizers 及其全部子套件，皆沒有依賴此套件，此套件亦沒有被其他套件使用，不清楚為何會安裝此套件？
    ```
    #219: osp.join(model_dir, 'tokenizer.json'))
    ```
- #### `.../dist-packages/modelscope/preprocessors/nlp/transformers_tokenizer.py`:
    > https://github.com/modelscope/modelscope/blob/master/modelscope/preprocessors/nlp/transformers_tokenizer.py#L57
    
    > 備註：vllm, transformers, tokenizers 及其全部子套件，皆沒有依賴此套件，此套件亦沒有被其他套件使用，不清楚為何會安裝此套件？
    ```
    #57: os.path.join(self.model_dir, 'tokenizer_config.json')):
    #59: os.path.join(self.model_dir, 'tokenizer_config.json'),
    ```
- #### `.../dist-packages/transformers/tokenization_utils_base.py`:
    > https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py
    ```
    #130:SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json"
    #131:ADDED_TOKENS_FILE = "added_tokens.json"
    #132:TOKENIZER_CONFIG_FILE = "tokenizer_config.json"
    #135:FULL_TOKENIZER_FILE = "tokenizer.json"
    #136:_re_tokenizer_file = re.compile(r"tokenizer\.(.*)\.json")
    #2118:        # We instantiate fast tokenizers based on a slow tokenizer if we don't have access to the tokenizer.json
    #2266:            # slow -> slow|fast, legacy: convert the `"added_tokens.json"` file to `added_tokens_decoder`.
    #2287:            # allows converting a fast -> slow: add the `tokenizer.json`'s `"added_tokens"` to the slow tokenizer
    #2288:            # if `tokenizer_config.json` is `None`
    ```
- #### `.../dist-packages/transformers/tokenization_utils_fast.py`:
    > https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_fast.py#L54
    ```
    #51:TOKENIZER_FILE = "tokenizer.json"
    #52:SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json"
    #53:TOKENIZER_CONFIG_FILE = "tokenizer_config.json"
    #56:ADDED_TOKENS_FILE = "added_tokens.json"
    #159:        # allows converting a slow -> fast, non-legacy: if the `tokenizer.json` does not have all the added tokens
    ```
- #### `.../dist-packages/transformers/tokenization_utils.py`:
   >  https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils.py#L48
    ```
    #47:SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json"
    #48:ADDED_TOKENS_FILE = "added_tokens.json"
    #49:TOKENIZER_CONFIG_FILE = "tokenizer_config.json"
    ```

<br>

<hr>
<hr>

<br>

## [[code]]

## 載入程式碼的 Traceback
Traceback (most recent call last):
```
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 6, in <module>
    from vllm.engine.async_llm_engine import AsyncLLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 12, in <module>
    from vllm.engine.llm_engine import LLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 14, in <module>
    from vllm.engine.output_processor.interfaces import (
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/interfaces.py", line 10, in <module>
    from vllm.transformers_utils.detokenizer import Detokenizer
  File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/detokenizer.py", line 193
```
- `__init__.py`#6
    - engine/async_llm_engine.py#12
        - engine/llm_engine.py#14
            - output_processor/interfaces.py#10
                - transformers_utils/detokenizer.py

<br>

## 處理 request

### detokenizer API 呼叫過程
[vllm/transformers_utils/detokenizer.py](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py)
- [decode_sequence_inplace()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L89)
    - [convert_prompt_ids_to_tokens()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L199)
        1. [detokenize_incrementally()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L224)
        2. decode_sequence_inplace()
        3. 重複 1 & 2

<br>

### 處理 request 的 Traceback
```
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File ".../vllm/entrypoints/openai/api_server.py", line 107, in create_completion
    generator = await openai_serving_completion.create_completion(
  File ".../vllm/entrypoints/openai/serving_completion.py", line 153, in create_completion
    async for i, res in result_generator:
  File ".../vllm/utils.py", line 228, in consumer
    raise item
  File ".../vllm/utils.py", line 213, in producer
    async for item in iterator:
  File ".../vllm/engine/async_llm_engine.py", line 661, in generate
    raise e
  File ".../vllm/engine/async_llm_engine.py", line 655, in generate
    async for request_output in stream:
  File ".../vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File ".../vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File ".../vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(

  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()

  File ".../vllm/engine/async_llm_engine.py", line 470, in engine_step
    request_outputs = await self.engine.step_async()

  File ".../vllm/engine/async_llm_engine.py", line 220, in step_async
    request_outputs = self._process_model_outputs(

  File ".../vllm/engine/llm_engine.py", line 489, in _process_model_outputs
    self.output_processor.process_outputs(seq_group, outputs)

  File ".../vllm/engine/output_processor/single_step.py", line 56, in process_outputs
    return self._process_sequence_group_outputs(sequence_group, outputs[0])

  File ".../vllm/engine/output_processor/single_step.py", line 111, in _process_sequence_group_outputs
    new_char_count = self.detokenizer.decode_sequence_inplace(

  File ".../vllm/transformers_utils/detokenizer.py", line 121, in decode_sequence_inplace
    read_offset) = detokenize_incrementally(

  File ".../vllm/transformers_utils/detokenizer.py", line 224, in detokenize_incrementally

```

<br>

## [detokenizer.py / convert_prompt_ids_to_tokens()](https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/detokenizer.py#L199)

- **測試 prompt**
    `<|begin_of_text|><|start_header_id|>user<|end_header_id|>今天天氣如何？<|eot_id|><|start_header_id|>assistant<|end_header_id|>`
- **執行追蹤**
    - prompt_ids = [128000, 128006, 882, 128007, 110916, 36827, 107895, 109425, 11571, 128009, 128006, 78191, 128007]
        - mapping
            - 128000 = `<|begin_of_text|>`
            - 128006 = `<|start_header_id|>`
            - 882 = `user`
            - 128007 = `<|end_header_id|>`
            - 110916 = `今天`
            - 36827 = `天`
            - 107895 = `氣`
            - 109425 = `如何`
            - 11571 = `？`
            - 128009 = `<|eot_id|>`
            - 128006 = `<|start_header_id|>`
            - 78191 = `assistant`
            - 128007 = `<|end_header_id|>`
        - `prompt_ids[-INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET - 2:]`
            = `prompt_ids[- 5 - 2:]`
            = `prompt_ids[-7:]` 
            (取倒數 7 個 token IDs，用途為何？)
    - **變數**
        - INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET = 5
        - skip_special_tokens = True
        - new_tokens = ['æ°£', 'å¦Ĥä½ķ', 'ï¼Ł', '<|eot_id|>', '<|start_header_id|>', 'assistant', '<|end_header_id|>']
            - 就是 `['氣', '如何', '？', '<|eot_id|>', '<|start_header_id|>', 'assistant', '<|end_header_id|>']`
        - prefix_offset = 2
        - read_offset = 7
    - **備註**
        - 如果 new_tokens 存在一個 special token，結果會是：
            - prefix_offset = 1
            - read_offset = 6
            - 因為 skip_special_tokens=True 會忽略掉該 special token
            - 比如：
                ```
                <|start_header_id|><|begin_of_text|>assistant<|end_header_id|>
                ```
                - 因為 **`<|begin_of_text|>`** 是「真的」special token