ReAct Agent - HackMD

# 如何客製化 GPT Agent 作者: 新光證券資訊部 Kevin (2024/01/15) ### Agent 範例當我們詢問 Agent 問題請他解決時, 下面範例可以看到 Agent 在處理問題時的整個邏輯 1. 開始思考問題怎麼解決 2. 根據解決問題的方法去執行動作 3. 將執行動作後的結果資訊蒐集起來 4. 確認蒐集的資訊是否有無適當的答案 5. 若無適當的答案則再繼續思考下一步該怎麼做 6. 以此類推... 直到找出問題答案為止 7. 給出答案 ### 範例一 ``` Quesiton : eleven * twenty five ``` ``` Thought: This is a simple multiplication operation. I can calculate this. Action: calculate: 11 * 25 PAUSE -- running calculate 11 * 25 Observation: 11 * 25 = 275 Answer: The result of eleven multiplied by twenty five is 275. ``` ### 範例二 ``` Quesiton : 台北101和世界最高的大樓差幾公尺? ``` ``` Thought: I need to find the height of Taipei 101 and the current tallest building in the world. Then, I can subtract the height of Taipei 101 from the height of the tallest building to get the difference. Action: google_search: Taipei 101 height PAUSE -- running google_search Taipei 101 height Observation: Taipei 101 is a skyscraper located in Taipei, Taiwan. It was completed in 2004 and was the tallest building in the world at that time, with a height of 508 meters (1,667 feet). Designed by C.Y. Lee & Partners, the building has 101 stories and features various facilities including offices, restaurants, and observation decks. The design incorporates elements of Taiwanese culture, such as the statue of Guanyin at the top and the Feng Shui pagoda on the eighth floor. Taipei 101's exterior is made of modern glass curtain walls and has strong wind resistance. Inside the building, there are high-speed elevators that can take passengers to the observation decks in 40 seconds, offering panoramic city views. Taipei 101 has become an important tourist attraction in Taipei, attracting visitors from both domestic and international locations. Thought: Now that I know the height of Taipei 101 is 508 meters, I need to find the height of the current tallest building in the world. Action: google_search: world's tallest building PAUSE -- running google_search world's tallest building Observation: I found some information about the world's tallest building: - The current tallest building in the world is the Burj Khalifa in Dubai, United Arab Emirates. It stands at a height of 828 meters (2,717 feet) and was officially opened in January 2010. - Before the Burj Khalifa, the Taipei 101 in Taipei, Taiwan held the title of the world's tallest building. It stands at a height of 508 meters (1,667 feet) and was completed in 2004. - Other notable tall buildings include the Shanghai World Financial Center in Shanghai, China, and the Shanghai Tower in Shanghai, China. - The Burj Khalifa broke not only the record for the world's tallest building but also the record for the tallest structure and tallest freestanding structure. Please note that this is just a summary of the information I found. If you need more specific details, I recommend reading the full articles or conducting a more specific search. Thought: Now that I know the height of the Burj Khalifa is 828 meters, I can calculate the difference between the heights of the Burj Khalifa and Taipei 101. Action: calculate: 828 - 508 PAUSE -- running calculate 828 - 508 Observation: 828 - 508 = 320 Answer: 台北101和世界最高的大樓差320公尺。 ``` # 環境設定 > 當前資料夾 cmd 輸入下列指令, 建議使用 vs code 環境 ## 虛擬環境建置 ``` python -m venv .env .\.env\Scripts\activate ``` 使用 vs code 請將環境切換到剛建立的虛擬環境 .env ，請點擊紅框處。 ![env](https://hackmd.io/_uploads/BymZ6BzFa.jpg) ## GPT Agent 環境建置 ``` pip install openai pip install beautifulsoup4 pip install google-api-python-client pip install httpx ``` > <span style="color:red">主程式為 ReAct.py</span> ## Python Code API Key OpenAI API Key 請自行輸入至下列內容中 ```python os.environ["OPENAI_API_KEY"] = "" ``` Google Search API Key (作者私人 API Key, 暫時分享使用) ```python os.environ["GOOGLE_API_KEY"] = "" os.environ["GOOGLE_CSE_ID"] = "" ``` # GPT Agnet 說明 (重要觀念~ 必看) ### 大型語言模型介面 (Interface) Agent Prompt 可做為一個與大型語言模型互動的 Template，透過此固定的 Template 可以讓專案專注在如何讓 LLM 根據需求呼叫適當的客製化工具 (Function 、API) 來完成任務。因此可把 Prompt 當成是一個新的程式語言，我們透過此新的程式語言 (Prompt) 創建通用的大型語言模型 Agent Interface，讓 Agent 與 GPT 互動。其 GPT 大型語言模型就不再被限制只能使用 OpenAI 的 ChatGPT API，因我們已建立好適當的 Interface 與大型語言模型互動。 ### Agent 實作 (Interface 實作) 當 Prompt Template 固定後, 接下來我們就可以透過自己熟習的程式語言 (python or C# ...) 來傳遞與接收與大型語言模型 (LLM) 互動前後的純文字訊息, 再將此 Convert 成實際需要執行的 Action 與 Response # 源碼剖析 : 核心原理 ### 摘要 ##### Prompt: 1. 提供核心邏輯描述 2. 提供可用工具描述 3. 提供範例讓 LLM 更理解整個運作方式 (Few-Shot)，以及理解工具參數的傳遞方式。 ##### Python Code: - 透過 python 去解析 LLM 回傳的文字並實作後續商業邏輯 - 實作與 GPT API 互動的程式。此範例實作 ChatBot 這個 Class 物件 (openai.chat.completions) - 解析我們指定的 Format 並 conver 成 Python Code - <span style="color:red">我們會將所有的問題、想法、動作與蒐集到的資訊的所有歷程，歷史過往記憶都堆疊在 openai.chat.completions API 的 messages 中</span> ### ReAct (Reasoning & Aciton) > 下面為提供 Agent 最重要的核心思維模式，請 LLM 依照此模式去 follow 後續的文字接龍。 ``` You run in a loop of Thought, Action, PAUSE, Observation. At the end of the loop you output an Answer Use Thought to describe your thoughts about the question you have been asked. Use Action to run one of the actions available to you - then return PAUSE. Observation will be the result of running those actions. ``` ### Tools (Agent 可用工具) - 我們可以透過此方式讓 Agent 有哪些工具可使用, 包含我們自行客製化的工具 - 假設1 - 計算 2+2 之後的答案推播到 Line 訊息中並同時發送 Email - 需提供的工具: 計算機、推播 Line 訊息 API、發送 Email 的 API - 假設2 - 提供 Agent 2024 年公司春酒日期並加入到私人行事曆 - 需提供的工具: 行事曆 API > 其提供的客製化工具，若有資安考量，我們皆可將資料保留在本機不傳任何資料給 Agent，只需通知動作是否完成，狀態為何？有無異常即可。 > 注意：parameter_query 這個客製工具提供 Agent 可以反問 User 從 Console Input 提供更詳盡的資訊來縮小問題範圍 > 下面為提供 Agent 可使用的工具，包含工具的描述介紹與使用範例 ``` Your available actions are: calculate: e.g. calculate: 4 * 7 / 3 Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary wikipedia: e.g. wikipedia: Django Returns a summary from searching Wikipedia get_stock_info: e.g. get_stock_info: 2002 Retrieves the latest stock information including quotes and news from Google Finance. This function is ideal for obtaining up-to-date financial data and news related to a specific stock. google_search: e.g. google_search: The address of the Shin Kong Securities Taipei Head Office. This function acts as a digital search assistant, designed to help users find current and relevant information on a wide array of topics. It's particularly useful for querying recent events, trending topics, and specific information queries. parameter_query: e.g. parameter_query: Which stock symbol/code would you like to inquire about? This function is crucial for engaging in a dynamic and interactive information gathering process with users. It adapts to the context of prior queries and identifies what specific additional information is required for a complete and accurate response. If the user's question is related to their own state or circumstances, please prioritize asking the user. ``` ### Examples (Few-shot) > 下列為提供給 GPT 參考的實際執行狀況, 讓 GPT 能夠依照我們給的提示, 回傳指定的 format ``` Example session: Question: What is the capital of France? Thought: I should look up France on google_search Action: google_search: France PAUSE You will be called again with this: Observation: France is a country. The capital is Paris. You then output: Answer: The capital of France is Paris Example2 session: Question: What is the current stock price? Thought: I need to find the latest stock quote based on user input. I need to ask the user which stock they want to inquire about. Action: parameter_query: What is the stock symbol/code you would like to inquire about? PAUSE You will be called again with this: Observation: 2330 You then output: Thought: I should look up the latest stock quote for 2330 Action: get_stock_info: 2330 PAUSE You will be called again with this: Observation: quote: $585.00 You then output: Answer: The current stock price of TSMC (Taiwan Semiconductor Manufacturing Company) is 580. ``` ### 程式碼實作 > 此程式碼包含 python 怎麼解析我們指定的 GPT 回傳文字格式及呼叫工具的部分程式碼即怎麼與 GPT 互動 ```python class ChatBot: def __init__(self, system=""): self.system = system self.messages = [] if self.system: self.messages.append({"role": "system", "content": system}) def __call__(self, message): self.messages.append({"role": "user", "content": message}) result = self.execute() self.messages.append({"role": "assistant", "content": result}) return result def execute(self): completion = openai.chat.completions.create( model="gpt-4", messages=self.messages, temperature=0) # Uncomment this to print out token usage each time, e.g. # {"completion_tokens": 86, "prompt_tokens": 26, "total_tokens": 112} # print(completion.usage) return completion.choices[0].message.content action_re = re.compile('^Action: (\w+): (.*)$') async def query(question, max_turns=10): i = 0 bot = ChatBot(prompt) next_prompt = question while i < max_turns: i += 1 result = bot(next_prompt) print(result) actions = [action_re.match(a) for a in result.split( '\n') if action_re.match(a)] if actions: # There is an action to run action, action_input = actions[0].groups() if action not in known_actions: raise Exception( "Unknown action: {}: {}".format(action, action_input)) print(" -- running {} {}".format(action, action_input)) observation = await known_actions[action](action_input) observation = extractRelateAnswer(f"{action}:{action_input}", observation) print("Observation:", observation) next_prompt = "Observation: {}".format(observation) else: history_prompt = "" for step in bot.messages: try: step_info = json.dumps( step, indent=2, ensure_ascii=False).encode('utf8') history_prompt += step_info.decode() + "\n" except: pass print("===== HISTORY =====") print(history_prompt) return # ===== Tools ===== async def wikipedia(q): return httpx.get("https://en.wikipedia.org/w/api.php", params={ "action": "query", "list": "search", "srsearch": q, "format": "json" }).json()["query"]["search"][0]["snippet"] async def calculate(what): return eval(what) async def get_stock_info(input_text): try: stock_news = google_finance_stock_news(input_text) news = stock_news.get_stock_news() return news except Exception as e: print(e) return "" async def google_search(input_text): """google_search Function""" return await google_search( input_text, google_api_key, google_cse_id ) async def google_search(search_term, api_key, cse_id, **kwargs): """Google Custom Search""" service = build("customsearch", "v1", developerKey=api_key) res = service.cse().list( q=f"{search_term}", cx=cse_id, **kwargs).execute() results = [] if "items" not in res: return results for item in res["items"][:5]: if "title" in item and "snippet" in item: url = item["link"] web_content_summary = getUrlsummarize(url, search_term) results.append( f"{item['title']} [{item['snippet']}] : {web_content_summary}" ) return results async def parameter_query(input_text): val = await ainput(input_text) return val async def ainput(string: str) -> str: await asyncio.get_event_loop().run_in_executor( None, lambda s=string: sys.stdout.write(s+' ')) return await asyncio.get_event_loop().run_in_executor( None, sys.stdin.readline) known_actions = { "wikipedia": wikipedia, "calculate": calculate, "google_search": google_search, "get_stock_info": get_stock_info, "parameter_query": parameter_query } if __name__ == "__main__": try: asyncio.run(query("台北101和世界最高的大樓差幾公尺?")) except Exception as e: print(e) ``` # Debug 建議研讀此文件的同仁嘗試下中斷點去一步一步觀看整個 Run Time 的過程資料流是如何運作的, 這樣很快就可以了解整個系統運作的邏輯脈絡。 # Reference ReAct Prompt 的寫法種類繁多, 上網查詢了許多 ReAct Prompt 的寫法, 最後找到一個通用的 Prompt 在執行上, GPT 比較會依照我們給的 Format 去產出文字，比較不會同星期五 demo 一樣發生與預期不符的狀況。上述 Demo 範例的改寫即是套用下面的參考資訊再調整而成的。附上參考連結 [A simple Python implementation of the ReAct pattern for LLMs](https://til.simonwillison.net/llms/python-react-pattern) #### 本文程式碼皆參考上述參考連接改寫而成