[Prime session] Generalist real-time computer vision model - 王建堯

歡迎來到 https://hackmd.io/@coscup/2024 共筆

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

點擊本頁上方的開始用 Markdown 一起寫筆記！
手機版請點選上方按鈕展開議程列表。

請從這裡開始

Challenges of Generalist model in real-time applications
- 每個 task 都需要 inference，很花時間
- 通常 model 很大，inference 慢
目標：不倚賴現行 pretrained 大模型，且可以 real-time inference

Architecture of Generalist Computer Vision Models
- CNN-based
  - 架構：encoder + decoder * N
- Transformer-based
  - 架構：encoder + decoder + light module * N
  - 需要pre-training
Generalist YOLO (ECCV 24)
- 架構
  - 為Transformer-based
  - 提出一個unified encoder，同時輸出3個feature解決不同下游任務，feature包含
    - pixel semantic feature
    - multi level instance semantic feature
    - interaction relation feature
- 可同時處理image-lvel, instance lev及pixel level的下游任務
- 挑戰：學習一個更為精準的encoder representation，可以被不同的Light module所使用
- Cosine representation learning
  - You only learn once
  - 訓練更好的feature基本思想
    1. Classifier (僅保留類別資訊)
    2. Discriminative Model (只關注decision boundary，保留特徵空間上的多餘資訊)
    3. Generative Model (更精準地找到該類別的contour)
- Precise representation learning (ECCV 24)
  - Programmable gradient information (PGI)
    - 背景：information bottleneck：deep plainNet會造成資訊丟失
    - 現有解決方法
      - Explicit：把input重複餵到不同層中
      - Implicit
        
        強迫neural network同時重建原本資料(E.g., NIPS23 RevCol v2)
        
        缺點是inference時間會變長(因為需要執行重建過程)，對於real-time 任務不友善
    - Auxilary residual branch
      - 解決Implicit方法的問題，訓練上會propogate gradient
      - 實際inference時不會使用到
      - 解決information bottleneck問題又可維持速度優勢
      - 每一個 branch 會有自己的 task loss

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.