--- title: 'Week 10 - Some GPT use' description: '地獄貓旅行團第 16 週心得分享' tags: IC slideOptions: transition: slide --- # OpenAI CLIP & BLIP ## Week 10 ---- ## CLIP 架構 ![](https://hackmd.io/_uploads/SJJquERV2.png) ---- ## BLIP 架構 ![](https://hackmd.io/_uploads/BkLWRN042.png) ---- ## 兩者差異 CLIP - 輸入影像,輸出每個關鍵字的機率,等同於分類器 BLIP - 輸入影像,輸出一段影像大綱的文字敘述 ---- ## Extra + [Multi-modal GPT](https://github.com/open-mmlab/Multimodal-GPT) + [Microsoft Visual ChatGPT](https://huggingface.co/spaces/microsoft/visual_chatgpt) --- ## CLIP Contrastive Language-Image Pre-Training + [Source code](https://github.com/openai/CLIP) + [Paper](https://arxiv.org/abs/2103.00020) + [Official Documentation](https://openai.com/research/clip) ---- ## 從 OpenAI Github 安裝 ``` $ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0 $ cd CLIP $ pip install -r requirements.txt $ pip install ftfy regex tqdm $ pip install git+https://github.com/openai/CLIP.git ``` ---- ## 極簡單範例 不需要openai key ```python= import torch import clip from PIL import Image device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = clip.load("ViT-B/32", device=device) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device) text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device) with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).cpu().numpy() print("Label probs:", probs) # prints: [[0.9927937 0.00421068 0.00299572]] ``` ---- ## 線上範例 感謝`HuggingFace`的資源,我們有辦法訓練自己的CLIP模型 + [online demo](https://huggingface.co/openai/clip-vit-base-patch32) + [Official Documentation](https://huggingface.co/docs/transformers/model_doc/clip#resources) --- ## BLIP Bootstrapping Language-Image Pre-training + [Source code](https://github.com/salesforce/BLIP) + [Paper](https://arxiv.org/abs/2201.12086) ---- ## 線上範例 + [demo 1](https://huggingface.co/spaces/Salesforce/BLIP) + [demo 2](https://replicate.com/salesforce/blip) + [Official Documentation](https://huggingface.co/docs/transformers/model_doc/blip)