InternVL === ###### tags: `LLM / models` ###### tags: `LLM`, `model`, `模型`, `InternVL`, `format`, `text`, `image`, `train` <br> [TOC] <br> ## InternVL ### [Chat Data Format](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html) - ### Pure Text Data ```json= { "id": 0, "conversations": [ {"from": "human", "value": "user input"}, {"from": "gpt", "text": "assistant output"}, {"from": "human", "value": "user input"}, {"from": "gpt", "text": "assistant output"} ] } ``` - ### Single-Image Data ```json= { "id": 0, "image": "path/to/image.jpg", "width": 111, "height": 222, "conversations": [ {"from": "human", "value": "<image>\nuser input"}, {"from": "gpt", "text": "assistant output"}, {"from": "human", "value": "user input"}, {"from": "gpt", "text": "assistant output"} ] } ``` - ### Multi-Image Data ``` { "id": 0, "image": ["path/to/image1.jpg", "path/to/image2.jpg", "path/to/image3.jpg"], "width_list": [111, 222, 333], "height_list": [111, 222, 333], "conversations": [ {"from": "human", "value": "<image>\nuser input <image>\nuser input"}, {"from": "gpt", "text": "assistant output"}, {"from": "human", "value": "<image>\nuser input"}, {"from": "gpt", "text": "assistant output"} ] } ``` <br> ## issues - [OCR data format #536](https://github.com/OpenGVLab/InternVL/issues/536) - hi , 如果要做端到端的OCR , 检测和识别的数据集应该是什么格式,一个box对应一张图和文本label吗? - 我们训练OCR的格式是: User: 请识别图中的文字(类似这样的几十个模板) Assistant: 图中的文字包括:\nXXX\nXXX\nXXX\nXXX(直接输出OCR的结果) 我们没有使用OCR的坐标框进行训练,具体的格式可以见我们的数据格式文档: https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html - 我们没有训练ocr的box是因为,大量的box训练对语言模型的语言能力破坏很厉害 ![](https://hackmd.io/_uploads/rkBFmFzAA.png)