Microsoft Visual ChatGPT | 圖片處理/編輯/生成圖片

visual-chatgpt openai GPT-4 microsoft 人工智慧

傳送門：Auto-GPT 學習筆記

更名：TaskMatrix

TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.

支援中文

Github：https://github.com/microsoft/visual-chatgpt

Microsoft Visual ChatGPT | 圖片處理/編輯/生成圖片

Demo

Microsoft Visual ChatGPT Image Analyze using Chinese Input

Microsoft Visual ChatGPT Image Analyze to Meme Meaning

前置作業：

申請 OpenAI API

安裝 Python

建立虛擬環境/安裝必要套件

參數說明

Prompt 設定詞更改

使用方法

學習資源

問題/Bug

GPT-3.5 中文缺點

簡介：Visual ChatGPT是一個基於GPT-3的對話生成模型，回答基本問題ok，生成與對話相關的圖像並對其進行影像處理。例如，它可以去除圖像中的物體或將它們替換為其他物體。(基本圖片處理ok)

Demo

ImageCaptioning + Text2Image

a gas station

a baby with a green shirt on

換成藍衣服

Microsoft Visual ChatGPT Image Analyze using Chinese Input

分析結果
Human: provide a figure named image/xxxx.png. The description is: a man in a hat and a man in a hat. This information helps you to understand this image, but you should use tools to finish following tasks, rather than directly imagine from my description. If you understand, say "Received".
AI: Received.
Human: 請問這張照片有幾個人，請用繁體中文回答，謝謝！
AI: 這張照片有兩個人，一個戴著帽子，另一個也戴著帽子。
Human: 錯誤，背景還有一個人
AI: 從照片中可以看到，有兩個戴著帽子的人，而且背景還有一個人。
Human: 所以總共三個人
AI: 從照片中可以看到，有兩個戴著帽子的人，而且背景還有一個人，所以總共三個人。, n_tokens: 63

Microsoft Visual ChatGPT Image Analyze to Meme Meaning

GPT3：a piece of bread on a plate.

Human: provide a figure named image/xxxx.png. The description is: a picture of a piece of bread on a plate. This information helps you to understand this image, but you should use tools to finish following tasks, rather than directly imagine from my description. If you understand, say "Received".
AI: Received.
Human: can u tell me this meme ?
AI: The image is of a piece of bread on a plate.
Human: why is that meme funny ?
AI: The image is of a piece of bread on a plate, but it is not possible to determine why it is funny from the image.

前置作業：

申請 OpenAI API

Visual ChatGPT 是基於 OpenAI 的 GPT 模型所建立的，因此需要先申請 OpenAI API 並獲取 API key 才能使用。您可以依照以下步驟完成申請：

前往 OpenAI 的申請頁面
輸入您的個人資料並同意使用條款
完成信用卡驗證
等待 OpenAI 審核通過後，您將收到 API key

安裝 Python

可前往 Python 官方網站 https://www.python.org/downloads/ 下載 Python
這裡提供 Windows、macOS 和 Linux 三種作業系統的安裝檔案。
下載後直接執行安裝程式即可完成安裝。

建立虛擬環境/安裝必要套件

fork 下來檔案後建立 virtual environment

最新版本教學：Quick Start

0314舊的教學

官方給的簡易教學步驟舊的(有最新版)

Quick Start
create a new environment

conda create -n visgpt python=3.8

activate the new environment

conda activate visgpt

prepare the basic environments

pip install -r requirement.txt

download the visual foundation models

bash download.sh

約略需要40G的大小空間
需要一點時間安裝完畢

prepare your private openAI private key

export OPENAI_API_KEY={Your_Private_Openai_Key}

create a folder to save images

mkdir ./image

Start Visual ChatGPT !

python visual_chatgpt.py

:::

記得要在同一層的路徑下執行下面指令

Visual-ChatGPT

requirement 套件

https://github.com/microsoft/visual-chatgpt/blob/main/requirements.txt

建立虛擬環境的好處是可以根據不同專案使用不同版本的套件
避免套件之間的衝突確保套件相依性的正確性。

這邊我選擇在 MacOS 系統下使用 pipenv 來建立虛擬環境

# 安裝 pipenv
$ pip install --user pipenv

# 進入專案目錄
$ cd /path/to/visual-chatgpt

# 建立虛擬環境並安裝 python3.8
$ pipenv shell
$ pipenv --python 3.8 

*若是輸入 pipenv --three 會安裝本機設定的預設版本

EY＊研究院
【Python】隨記：pipenv使用不同版本的python
https://dotblogs.com.tw/Eyelash/Search?q=pip

成功建立虛擬環境並確認 python 版本：

接著在 Terminal 根據虛擬環境輸入對應指令安裝套件：

1.conda

pip install -r requirements.txt

2.pipenv

pipenv install -r requirements.txt

即可安裝對應正確的套件版本

參考資料：
pipenv使用不同版本的python
[Day-02] 專案的 Python 環境設置（如何使用 Pipenv）

填入你的 OpenAi API key

export OPENAI_API_KEY={Your_Private_Openai_Key}

參數說明

issue 看到有人提如何改变图片分辨率从512到1024 #145

如何改圖片大小建議預設












class Text2Image:
    def __init__(self, device):
        print("Initializing Text2Image to %s" % device)
        self.device = device
        self.torch_dtype = torch.float16 if 'cuda' in device else torch.float32
        self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5",
                                                            torch_dtype=self.torch_dtype)
        self.text_refine_tokenizer = AutoTokenizer.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
        self.text_refine_model = AutoModelForCausalLM.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
        self.text_refine_gpt2_pipe = pipeline("text-generation", model=self.text_refine_model,
                                              tokenizer=self.text_refine_tokenizer, device=self.device)
        self.pipe.to(device)



























def run_image(self, image, state, txt):
        image_filename = os.path.join('image', str(uuid.uuid4())[0:8] + ".png")
        print("======>自動調整圖片大小...")
        img = Image.open(image.name)
        width, height = img.size
        ratio = min(512 / width, 512 / height)
        width_new, height_new = (round(width * ratio), round(height * ratio))
        width_new = int(np.round(width_new / 64.0)) * 64
        height_new = int(np.round(height_new / 64.0)) * 64
        img = img.resize((width_new, height_new))
        img = img.convert('RGB')
        img.save(image_filename, "PNG")
        print(f"Resize image form {width}x{height} to {width_new}x{height_new}")
        description = self.models['ImageCaptioning'].inference(image_filename)
        Human_prompt = "\nHuman: provide a figure named {}. The description is: {}. " \
                       "This information helps you to understand this image, " \
                       "but you should use tools to finish following tasks, " \
                       "rather than directly imagine from my description and replied with traditonal chinese. If you understand, say \"Received\". \n".format(
            image_filename, description)
        AI_prompt = "收到.  "
        self.agent.memory.buffer = self.agent.memory.buffer + Human_prompt + 'AI: ' + AI_prompt
        state = state + [(f"![](/file={image_filename})*{image_filename}*", AI_prompt)]
        print(f"\nProcessed run_image, Input image: {image_filename}\nCurrent state: {state}\n"
              f"Current Memory: {self.agent.memory.buffer}")
        return state, state, txt + ' ' + image_filename + ' '

Prompt 設定詞更改

更改以實驗性質為測試可能會影響準確度(?)

and replied with traditional chinese

使用方法

預設需要載入的
（default="ImageCaptioning_cuda:0,Text2Image_cuda:0"）

python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu

python＋python檔案＋載入模型＋選擇cpu/gpu

Mac 目前看起來是要指定 python3 :

python3 visual_chatgpt.py --load ImageCaptioning_cpu,ImageEditing_cpu,Text2Image_cpu

Foundation Model	GPU Memory (MB)
ImageEditing	3981
InstructPix2Pix	2827
Text2Image	3385
ImageCaptioning	1209
Image2Canny	0
CannyText2Image	3531
Image2Line	0
LineText2Image	3529
Image2Hed	0
HedText2Image	3529
Image2Scribble	0
ScribbleText2Image	3531
Image2Pose	0
PoseText2Image	3529
Image2Seg	919
SegText2Image	3529
Image2Depth	0
DepthText2Image	3531
Image2Normal	0
NormalText2Image	3529
VisualQuestionAnswering	1495

Foundation Model 翻譯成繁體中文的列表：(ChatGPT翻譯)

圖片編輯
圖片轉圖片
文字轉圖片
圖片標註
圖片轉 Canny 邊緣檢測
Canny 邊緣檢測與文字轉圖片
圖片轉線畫
線畫與文字轉圖片
圖片轉 Hed 邊緣檢測
Hed 邊緣檢測與文字轉圖片
圖片轉手繪草圖
手繪草圖與文字轉圖片
圖片轉姿態
姿態與文字轉圖片
圖片轉分割
分割與文字轉圖片
圖片轉深度
深度與文字轉圖片
圖片轉法向量
法向量與文字轉圖片
視覺問答

學習資源

https://blog.aidec.tw/post/microsoft-visual-chatgpt-win10?utm_source=blog&utm_medium=postref

https://zhuanlan.zhihu.com/p/614319329

問題/Bug

ModuleNotFoundError: No module named 'gradio'

重新安裝
pip install

python3 visual_chatgpt.py –load ImageCaptioning_cpu,Text2Image_cpu
Mac 要指定 python3

已解決
mac m1 似乎要改 requirment

https://github.com/microsoft/visual-chatgpt/issues/149#issuecomment-1465878310

Mon, Mar 13, 2023 11:29 PM

http://0.0.0.0:7868

* 已解決沒辦法在Mac原生瀏覽器開啟
canot open in safari

If you cannot open the Visual ChatGPT web interface in Safari, you can try using a different web browser such as Google Chrome or Mozilla Firefox. Sometimes, certain websites or web applications may not be fully compatible with certain web browsers, so switching to a different one may help.

改 localhost
https://github.com/microsoft/visual-chatgpt/issues/235#issuecomment-1475764470

port 被佔用到?

demo.launch(server_name="0.0.0.0", server_port=7868)

Mac 系統 Port 號被佔用的解決方式(Mac Port xxx is already in use)
https://israynotarray.com/other/20220117/241830554/

如何清除系統上被佔用的Port?
https://andy6804tw.github.io/2018/02/28/kill-port-macos/

Mac 找出被佔用的IP及port find IP port usage
https://matthung0807.blogspot.com/2019/11/mac-find-network-port-usage.html

已解決
無法找到 Open Ai key
加上環境變數

added this to the visual_chatgpt.py file (line 11): os.environ["OPENAI_API_KEY"] = "INSERT_YOUR_KEY_HERE" (replace with your key and save the file)

https://github.com/microsoft/visual-chatgpt/issues/152

但圖片編輯的部分不能運作

已解決 Error 1 leaked semaphore objects
opt/homebrew/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

已解決，加入 ImageEditing_cpu

GPT-3.5 中文缺點

作為一名人工智能語言模型，GPT-3.5 的中文理解能力已經相當出色，但它仍然存在一些缺點和挑戰。以下是一些可能的缺點：

語言多樣性：GPT-3.5 的中文語料庫主要來源於標準漢語，而在中國不同地區、不同群體使用的方言和口音非常豐富，這些語言變體的多樣性可能會對模型的理解能力造成一定的影響。
上下文理解：雖然GPT-3.5可以根據前面的文本來生成下一步的文本，但是它的理解能力仍然有限，尤其是在處理複雜的上下文和抽象的概念時，往往會出現誤解或錯誤的理解。
語義模糊：中文作為一種語言，有時候存在語義模糊的情況，這種模糊性可能會對模型的理解和表達能力造成影響，特別是在處理文學、詩歌等具有高度文化內涵的文本時。
實體識別：中文的實體識別和命名實體識別相對於英文來說更具有挑戰性，因為中文語言中沒有明確的單詞分隔符。這可能導致模型在處理實體識別時出現困難。

總之，雖然GPT-3.5在中文理解方面取得了很大的進展，但仍然需要在語言多樣性、上下文理解、語義模糊和實體識別等方面不斷完善和進步。

以上筆記文字內容由 ChatGPT 共同協作完成