# [LangChain] 範例-Graph QA 日期:2023/09/22 ## 敘述 - 主題:GraphIndexCreator()、GraphQAChain() - 使用情境:透過此範例程式,便可快速地將一段敘述轉換成三元組(triple)串列,可明確地表示各個實體之間的關係,進一步則可透過這個三元組串列進行查詢,用於實現QA任務。 - 輸入: 1. 要轉換成三元組串列的敘述。 2. Q。將透過三元組串列對問題進行回答。 - 輸出: 補充: 一段敘述或文章中可能會存在多個實體(Entity),並且實體間會有相互的關係(Relation)。Knowledge Graph便是由其基本組成單位組成-「實體-關係-實體」三元組(triple),透過這樣去描述現實世界中的概念及其相互關係。 ## 目錄 - [1. 三元組轉換](#1-三元組轉換) - [2. QA查詢](#2-QA查詢) - [3. 視覺化](#3-視覺化) ____________________ ## 1. 三元組轉換 導入套件 ```python= from dotenv import load_dotenv load_dotenv() from langchain.llms import OpenAI ``` ```python= from langchain.indexes import GraphIndexCreator from langchain.chains import GraphQAChain import networkx as nx import matplotlib.pyplot as plt ``` `text`為某條新聞的一部份片段,將會透過GraphIndexCreator()解析文字中各個實體及其關係,隨後轉換成三元組(triple)格式,以便建立graph。 ```python= text = ''' Russia, the United States and China have all built new facilities and dug new tunnels at their nuclear test sites in recent years, satellite images obtained exclusively by CNN show, at a time when tensions between the three major nuclear powers have risen to their highest in decades. While there is no evidence to suggest that Russia, the US or China is preparing for an imminent nuclear test, the images, obtained and provided by a prominent analyst in military nonproliferation studies, illustrate recent expansions at three nuclear test sites compared with just a few years ago. One is operated by China in the far western region of Xinjiang, one by Russia in an Arctic Ocean archipelago, and another in the US in the Nevada desert. The satellite images from the past three to five years show new tunnels under mountains, new roads and storage facilities, as well as increased vehicle traffic coming in and out of the sites, said Jeffrey Lewis, an adjunct professor at the James Martin Center for Nonproliferation Studies at the Middlebury Institute of International Studies. “There are really a lot of hints that we’re seeing that suggest Russia, China and the United States might resume nuclear testing,” he said, something none of those countries have done since underground nuclear testing was banned by the 1996 Comprehensive Nuclear Test Ban Treaty. China and the US signed the treaty, but they haven’t ratified it. Retired US Air Force Col. Cedric Leighton, a former intelligence analyst, reviewed the images of the three powers’ nuclear sites and came to a similar conclusion. “It’s very clear that all three countries, Russia, China and the United States have invested a great deal of time, effort and money in not only modernizing their nuclear arsenals, but also in preparing the types of activities that would be required for a test,” he said. Moscow has ratified the treaty, but Russian President Vladimir Putin said in February he would order a test, if the US moves first, adding that “no one should have dangerous illusions that global strategic parity can be destroyed.” The expansions risk sparking a race to modernize nuclear weapons testing infrastructure at a time of deep mistrust between Washington and the two authoritarian governments, analysts said, though the idea of actual armed conflict is not considered imminent. “The threat from nuclear testing is from the degree to which it accelerates the growing arms race between the United States on one hand, and Russia and China on the other,” Lewis said. “The consequences of that are that we spend vast sums of money, even though we don’t get any safer. ''' ``` 注意: 1. 實際測試發現,`index_creator.from_text(text)`每次得出的結果並不相同,將會影響查詢的表顯。 2. `text`也可以是中文的,不過`index_creator.from_text(text)`輸出的結果仍會使用英文表示。 ```python= #定義GraphIndexCreator()物件。 index_creator = GraphIndexCreator(llm=OpenAI(temperature=0)) graph = index_creator.from_text(text) #以GML格式保存到文件中 graph.write_to_gml("graph.gml") graph.get_triples() ``` :::success [('Russia', 'new facilities', 'has built'), ('Russia', 'new tunnels', 'has dug'), ('Russia', 'vehicle traffic', 'has increased'), ('Russia', 'Comprehensive Nuclear Test Ban Treaty', 'has ratified'), ('United States', 'new facilities', 'has built'), ('United States', 'new tunnels', 'has dug'), ('United States', 'vehicle traffic', 'has increased'), ('United States', 'Comprehensive Nuclear Test Ban Treaty', 'has not ratified'), ('China', 'new facilities', 'has built'), ('China', 'new tunnels', 'has dug'), ('China', 'vehicle traffic', 'has increased'), ('China', 'Comprehensive Nuclear Test Ban Treaty', 'has not ratified'), ('Vladimir Putin', 'he would order a test', 'has said'), ('Vladimir Putin', 'if the US moves first', 'has said'), ('Vladimir Putin', 'no one should have dangerous illusions', 'has said')] ::: 如上輸出顯示,已將文字轉換為graph。下一步,便可透過此graph查詢問題。 ## 2. QA查詢 透過此graph查詢問題。 注意: 1. 需考慮到三元組中的實體本身的語言。例如上面轉換出一個名為United States的實體,查詢時則必須使用同個名字同個語言。 2. 除了實體名以外,其他部分無論使用中文或英文程式皆可理解。 ```python= #輸入 input_list = [ 'United States做了甚麼?', 'United States建了甚麼?', 'Vladimir Putin說了甚麼?', '誰還沒批准Comprehensive Nuclear Test Ban Treaty?', ] #定義GraphQAChain(),提供輸入進行查詢。 chain = GraphQAChain.from_llm(OpenAI(temperature=0), graph=graph, verbose=True) for n in input_list: print('-'*45) output = chain.run(n) print('Q: ',n) print('A: ',output) ``` :::success \--------------------------------------------- \> Entering new GraphQAChain chain... Entities Extracted: United States Full Context: United States has built new facilities United States has dug new tunnels United States has increased vehicle traffic United States has not ratified Comprehensive Nuclear Test Ban Treaty \> Finished chain. Q: United States做了甚麼? A: The United States has built new facilities, dug new tunnels, and increased vehicle traffic, but has not ratified the Comprehensive Nuclear Test Ban Treaty. \--------------------------------------------- \> Entering new GraphQAChain chain... Entities Extracted: United States Full Context: United States has built new facilities United States has dug new tunnels United States has increased vehicle traffic United States has not ratified Comprehensive Nuclear Test Ban Treaty \> Finished chain. Q: United States建了甚麼? A: The United States has built new facilities and dug new tunnels. \--------------------------------------------- \> Entering new GraphQAChain chain... Entities Extracted: Vladimir Putin Full Context: Vladimir Putin has said he would order a test Vladimir Putin has said if the US moves first Vladimir Putin has said no one should have dangerous illusions \> Finished chain. Q: Vladimir Putin說了甚麼? A: Vladimir Putin has said he would order a test if the US moves first, and that no one should have dangerous illusions. \--------------------------------------------- \> Entering new GraphQAChain chain... Entities Extracted: Comprehensive Nuclear Test Ban Treaty Full Context: \> Finished chain. Q: 誰還沒批准Comprehensive Nuclear Test Ban Treaty? A: 不知道。 ::: ## 3. 視覺化 既然都叫graph了,自然會希望能以某種圖像化的方式呈現,不然僅僅是透過如前面一個串列的三元組,也不是很容易理解到底存在甚麼實體和關係。 透過以下程式碼將前面得到的三元組串列視覺化。 ```python= #將graph.gml視覺化,以便更好地理解。 # 讀取.gml檔 H = nx.read_gml("graph.gml") layout = nx.spring_layout(H, k=5, seed=2) nx.draw(H, pos=layout, with_labels=True, node_size=500, node_color='skyblue', font_size=10) edge_labels = nx.get_edge_attributes(H, 'relation') nx.draw_networkx_edge_labels(H, layout, edge_labels=edge_labels, font_size=8) plt.show() ``` ![](https://hackmd.io/_uploads/SkTzVYC1T.png) 圖中有不同的節點(node)和線條(edge),對應到三元組中的實體(Entity)和關係(Relation)。透過此突變可以清楚地了解到各個實體具有什麼樣的關係了。 ## 補充 本範例使用的模型物件為OpenAI()。OpenAI()對於個別文字或詞彙擁有更好,對於理解提供的prompt以及輸入都會有更好的理解力,因此在類似這樣解析的任務當中會表現得更好。 以下為改使用ChatOpenAI()產生的三元組結果,可進行比較。 以本文來說,本文的重點需要放在三個國家身上-美國、中國、俄國,文中有提到普丁,作為國家的代表人,他也會是文章重點之一。 可發現以下結果中,出現了許多非重點的實體,這可能將無助於很好的回答這篇文章的重點;但是換個角度來看,這代表可能可以捕捉到一些重點外的瑣碎資訊。如果今天的需求是需要盡可能捕捉所有資訊,那也許ChatOpenAI()就可以使用。不過還需要注意,解析出的實體可能會比較奇怪,可參考下面輸出。 :::success ('tensions', 'highest in decades', 'have risen to'), ('satellite images', 'new roads and storage facilities', 'show'), ('Jeffrey Lewis', 'James Martin Center for Nonproliferation Studies', 'is an adjunct professor at'), ('Jeffrey Lewis', 'Middlebury Institute of International Studies', 'is an adjunct professor at'), ('underground nuclear testing', '1996 Comprehensive Nuclear Test Ban Treaty', 'was banned by'), ('China and the US', 'treaty', "haven't ratified"), ('US Air Force Col. Cedric Leighton', 'conclusion', 'came to'), ('Moscow', 'treaty', 'has ratified'), ('Washington', 'two authoritarian governments', 'has deep mistrust with'), ('idea of actual armed conflict', 'imminent', 'is not considered'), ('consequences of that', 'we spend vast sums of money', 'are that')] ::: ____________ ## 參考 https://sourcezones.net/2023/03/17/07/ https://zh.wikipedia.org/zh-tw/%E7%9F%A5%E8%AD%98%E5%9C%96%E8%AD%9C