InternVL
===
###### tags: `LLM / models`
###### tags: `LLM`, `model`, `模型`, `InternVL`, `format`, `text`, `image`, `train`
<br>
[TOC]
<br>
## InternVL
### [Chat Data Format](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html)
- ### Pure Text Data
```json=
{
"id": 0,
"conversations": [
{"from": "human", "value": "user input"},
{"from": "gpt", "text": "assistant output"},
{"from": "human", "value": "user input"},
{"from": "gpt", "text": "assistant output"}
]
}
```
- ### Single-Image Data
```json=
{
"id": 0,
"image": "path/to/image.jpg",
"width": 111,
"height": 222,
"conversations": [
{"from": "human", "value": "<image>\nuser input"},
{"from": "gpt", "text": "assistant output"},
{"from": "human", "value": "user input"},
{"from": "gpt", "text": "assistant output"}
]
}
```
- ### Multi-Image Data
```
{
"id": 0,
"image": ["path/to/image1.jpg", "path/to/image2.jpg", "path/to/image3.jpg"],
"width_list": [111, 222, 333],
"height_list": [111, 222, 333],
"conversations": [
{"from": "human", "value": "<image>\nuser input <image>\nuser input"},
{"from": "gpt", "text": "assistant output"},
{"from": "human", "value": "<image>\nuser input"},
{"from": "gpt", "text": "assistant output"}
]
}
```
<br>
## issues
- [OCR data format #536](https://github.com/OpenGVLab/InternVL/issues/536)
- hi , 如果要做端到端的OCR , 检测和识别的数据集应该是什么格式,一个box对应一张图和文本label吗?
- 我们训练OCR的格式是:
User: 请识别图中的文字(类似这样的几十个模板)
Assistant: 图中的文字包括:\nXXX\nXXX\nXXX\nXXX(直接输出OCR的结果)
我们没有使用OCR的坐标框进行训练,具体的格式可以见我们的数据格式文档:
https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html
- 我们没有训练ocr的box是因为,大量的box训练对语言模型的语言能力破坏很厉害
