Brain Natural Language Engine (BNLE)

# Brain Natural Language Engine (BNLE) ## Approach 1 : Template based QA system ### Elements of BNLE 1. Identifying different elements of the question and tagging elements as Nouns, verbs etc. 2. Converting Nouns into Brain Tokens 3. Disabiguation using context 4. Using Auxilary verbs to identify the template 5. Combining adpositions and Brain Tokens 6. Combining with the template to create a Knowledge Question 7. Using Knowledge Question create a dependency graph ### Example 1. What is the dob of India's Prime Minister? **Step 1:** What|PRON is|AUX the|DET dob|NOUN of|ADP India|PROPN 's Prime Minister|PROPN? **Step 2:** Nouns: dob, India, Prime Minister Converting to brain-tokens: dob --> common/attribute/person/date_of_birth India --> common/entity/country/1 Prime Minister --> common/attribute/person/prime_minister_of, common/attribute/country/prime_minister **Step 3:** Prime Minister --> common/attribute/country/prime_minister **Step 4:** Auxilary Verb(AUX): is Template from sentence: ___ is ___ ? Elements before AUX: 'What' Template: What is ___ ? **Step 5:** adposition: 'of' Template from sentence: ___ of ___ Combining with Brain Tokens: 'common/attribute/person/date_of_birth' of 'common/entity/country/1' 's 'common/attribute/country/prime_minister' **Step 6:** Combining with template: What is 'common/attribute/person/date_of_birth' of 'common/entity/country/1' 's 'common/attribute/country/prime_minister'? **Step 7:** Create dependency Graph ![](https://i.imgur.com/1uWQGMY.png) 2. What is the height and weight of Sachin Tendulkar ? 3. What is the length of yamuna and ganga? 4. Who is the prime minister of India? 5. Who was the actor of Don Movie? 6. Who is the youngest CEO in India? 7. Which is longest river in Brazil ? 8. Who are the players of Indian cricket team? 9. Who is the first female prime minister of India? 10. What is the population of India? --- Types of QA systems: 1. Open domain QA 2. Restricted domain QA We are working on creating an open domain QA system Algo in general QA systems: 1. Morphological processing 1.1 Tokenization 1.2 stemming or lemmatization 2. Syntactic analysis 2.1 Here, the structure of the sentence is worked out based on correct grammar. 3. Semantic analysis 3.1 Rule-based classification algorithms 3.2 Machine learning based algorithms… 4. Pragmatic analysis ## Approach 2 : Template-less QA system Steps for a typical KGQA system: 1. Entity/Relation detection 1.1 Detecting named/common entities 1.2 Data augmentation using dictionary based on KG 2. Entity/Relation linking 2.1 Disambiguation 2.2 Linking detected entities and relations to KG 3. Answer matching (by query construction or embedding-based methods) 4. Subgraph selection ### Approach 2.1 Using reasoning Our approach towards creating a KGQA system capable of answering complex questions is based on the idea that answer of a complex query can be represented as a **state**. Intially state is empty. The complex query gives us a list actions that we need to perform to get the **answer state**. To understand this approach clearly, we will be using the following example: > Example: Which film starred by Amitabh Bachchan and directed by Ramesh Sippy? > Let the answer be 'x' [color=#0000ff] and we will be using the following knowledge graph: ```graphviz digraph{ "Rajkumar Hirani"-> "3 Idiots" [label=" directed"] "Rajkumar Hirani"-> "PK" [label=" directed"] "Ramesh Sippy"-> "Sholay" [label=" directed"] "Ramesh Sippy"-> "Zamana Deewana" [label=" directed"] "Ramesh Sippy"-> "Andaaz" [label=" directed"] "Aamir Khan"-> "3 Idiots" [label=" starred"] "Farah Khan"-> "Happy New Year" [label=" directed"] "Aamir Khan"-> "PK" [label=" starred"] "Amitabh Bachchan"-> "Agneepath" [label=" starred"] "Amitabh Bachchan"-> "Sholay" [label=" starred"] "Amitabh Bachchan"-> "Zamana Deewana" [label=" starred"] "Deepika Padukone"-> "Happy New Year" [label=" starred"] "3 Idiots"-> "film" [label=" type"] "PK"-> "film" [label=" type"] "Agneepath"-> "film" [label=" type"] "Sholay"-> "film" [label=" type"] "Andaaz"-> "film" [label=" type"] "Happy New Year"-> "film" [label=" type"] "Zamana Deewana"-> "play" [label=" type"] "Rajkumar Hirani"-> "director" [label=" type"] "Ramesh Sippy"-> "director" [label=" type"] "Farah Khan"-> "director" [label=" type"] "Aamir Khan"-> "actor" [label=" type"] "Amitabh Bachchan"-> "actor" [label=" type"] "Deepika Padukone"-> "actor" [label=" type"] } ``` #### Step 1: Entity detection In any question, entities will always be either proper or common nouns. Using this knowledge, we extract all the nouns available in a question. > In case of our example nouns are 'film', 'Amitabh Bachchan', 'Ramesh Sippy' [color=#0000ff] The above step can be achieved using POS tagging. #### Step 2: Relation Detection Now that we have a list of entities, we need to detect how each of them are connected to each other and to our assumed answer. To do this first we detect what 'type' of answer is required. > In our example, type of output required is 'film'. > Hence, we have a relation: ['x', 'type', 'film'] [color=#0000ff] Notice that in the triplet, subject is a variable node while the object is a known entity. For our approach, we are calling these types of relations as **object relations**. Similarly, if the the object is variable node and subject is a known entity, we call them **subject relations**. > In our example, the remaining two entities have a subject relation with our assumed answer. > ['Amitabh Bachchan', 'starred', 'x'] and ['Ramesh Sippy', 'directed', 'x'] [color=#0000ff] All these relations together are known as an **action pool** > For our example, action pool will look something like this: [color=#0000ff] ```graphviz digraph{ rankdir = "LR" subgraph cluster_0 { node [style=filled]; "Amitabh Bachchan"; label = "Action"; color=red; } subgraph cluster_1 { node [style=filled]; "Ramesh Sippy"; label = "Action"; color=red; } subgraph cluster_2 { node [style=filled]; "film"; label = "Action"; color=red; } subgraph cluster_3 { node [style=filled]; "x"; label = "Intial State"; color=blue; } "Amitabh Bachchan" -> "x" [label="starred"] "Ramesh Sippy" -> "x" [label="directed"] "x" -> "film" [label="type"] } ``` This step can be achieved using dependency parsing. #### Step 3: Reasoning Before we execute each of our actions, we first need to understand the reasoning algortihm. 1. Start by choosing a random action from action pool. 2. Check whether the action is made up of subject relation or object relation. * If the action is made up of an object relation, then this action will not go forward until all subject relations connected to variable node are executed. * If the action is made of a subject relation, then this action can be executed to change the state of variable node. 3. Execution of an action takes place as follows: * The first execution will populate the state with the data agreing with the action. * The executions after that will remove the data not agreeing with the action. 4. Once an action is completed we take the action out of action pool. 4. Repeat step 1 until all actions are done. 5. Data in the state is our answer. Let's see this algorithm in action. > For our example, we are choosing ['Amitabh Bachchan', 'starred', 'x'] as our first action. Since this is a subject relation this action will get executed. This also being our first execution will populate our state from Knowledge graph. If we look at our knowledge graph, we will see that "Amitabh Bachchan" has "starred" relation with "Agneepath", "Sholay" and "Zamana Deewana". Therefore after action 1 our state looks like this: > x = ["Agneepath", "Sholay", "Zamana Deewana"] [color=#0000ff] ```graphviz digraph{ rankdir = "LR" compound=true; subgraph cluster_0 { node [style=filled]; "Amitabh Bachchan" "x"; label = "Action 1"; color=red; } "Amitabh Bachchan" -> "x" [label="starred"] subgraph cluster_1 { node [style=filled]; "Agneepath, Sholay, Zamana Deewana"; label = "State changed to State 1"; color=red; } "x" -> "Agneepath, Sholay, Zamana Deewana" [ltail=cluster_0,lhead=cluster_1]; } ``` > Removing ['Amitabh Bachchan', 'starred', 'x'] from our action pool. Making our action pool look something like this: [color=#0000ff] ```graphviz digraph{ rankdir = "LR" subgraph cluster_1 { node [style=filled]; "Ramesh Sippy"; label = "Action"; color=red; } subgraph cluster_2 { node [style=filled]; "film"; label = "Action"; color=red; } subgraph cluster_3 { node [style=filled]; "x"; label = "State 1"; color=blue; } "Ramesh Sippy" -> "x"[label="directed"]; "x" -> "film" [label="type"] } ``` > Next let's choose ['x', 'type', 'film'] as our second action (Here 'x' contains "Agneepath, Sholay, Zamana Deewana" or State 1. We are using 'x' for visual clarity). [color=#0000ff] ```graphviz digraph{ rankdir = "LR" compound=true; subgraph cluster_0 { node [style=filled]; "film" "x"; label = "Action 2"; color=red; } "x" -> "film" [label="type"] [ltail=cluster_0,lhead=cluster_1]; } ``` > Since this action is an object relation, we must execute all the subject relation on the variable node before executing this object relation. As we can see from our action pool, we have 1 subject relation remaining, hence we will execute that before executing ['x', 'type', 'film']. [color=#0000ff] ```graphviz digraph{ rankdir = "LR" compound=true; subgraph cluster_1 { node [style=filled]; "Ramesh Sippy" "x"; label = "Action 2.1"; color=red; } subgraph cluster_0 { node [style=filled]; "film" "x"; label = "Action 2.2"; color=red; } "Ramesh Sippy" -> "x"[label="directed"]; "x" -> "film" [label="type"] } ``` > Since this is not our first execution, we will be removing entities that doesn't satisfy the relation. From our graph we can see that "Ramesh Sippy" has "Directed" relation with only "Sholay" and "Zamana Deewana". Hence "Agneepath" is removed from the state. Therefore after action 2.1 our state looks like this: > x = ["Sholay", "Zamana Deewana"] [color=#0000ff] ```graphviz digraph{ rankdir = "LR" compound=true; subgraph cluster_0 { node [style=filled]; "Ramesh Sippy" "x"; label = "Action 2.1"; color=red; } "Ramesh Sippy" -> "x" [label="directed"] subgraph cluster_1 { node [style=filled]; "Sholay, Zamana Deewana"; label = "State changed to State 2.1"; color=red; } "x" -> "Sholay, Zamana Deewana" [ltail=cluster_0,lhead=cluster_1]; } ``` > Removing ['Ramesh Sippy', 'directed', 'x'] from our action pool. Making our action pool look something like this: [color=#0000ff] ```graphviz digraph{ rankdir = "LR" subgraph cluster_2 { node [style=filled]; "film"; label = "Action"; color=red; } subgraph cluster_3 { node [style=filled]; "x"; label = "State 2.1"; color=blue; } "x" -> "film" [label="type"] } ``` > Since no other subject relation is left to execute we can finally execute our object relation. Again since this is not the first execution, we will be removing entities that do not agree with our action. From knowledge graph, we can see that only "Sholay" has "type" realtion with "film". Therefore, only "Sholay" will be left in the pool. Hence, after action 2.2 our state looks like this: > x = ["Sholay"] [color=#0000ff] ```graphviz digraph{ rankdir = "LR" compound=true; subgraph cluster_0 { node [style=filled]; "x" "film"; label = "Action 2.2"; color=red; } "x" -> "film" [label="type"] subgraph cluster_1 { node [style=filled]; "Sholay"; label = "State changed to State 2.2"; color=red; } "film" -> "Sholay" [ltail=cluster_0,lhead=cluster_1]; } ``` > Removing ['x', 'type', 'film'] from our action pool. Since our action pool is now empty, the entities available in state is our answer. [color=#0000ff] ```graphviz digraph{ rankdir = "LR" compound=true; subgraph cluster_1 { node [style=filled]; "Sholay"; label = "Answer"; color=red; } } ``` --- Rough Work: Step 1: Entities extraction using Dependency Parsing Nouns and Proper Nouns will be our entities. However, to extract complex entities consisting of complex words and modifers, POS tagging won't be enough. We need to parse the dependency tree of the sentence. **Example:** Who ... attr is ... ROOT the ... det first ... amod female ... amod Prime ... compound -- ... punct Minister ... attr of ... prep India ... pobj ? ... punct **Note:** amod refers to modifier of attribute ![](https://i.imgur.com/78ththY.png) In our dependency tree we will only keep objects, attributes and modifiers Using dependency parsing, we combine the compound words to make the whole attribute (predicate) should be 'Prime-Minister'. The qualifiers of this attribute are stored seperately, qualifiers = [first, female] Similarly, 'India' is also an object (pobj) but without any modifiers. From the above example we can make a general rule: **Extract the subject/object along with its compound words/modifiers/punctuation marks and attributes along with its compound words/modifiers/punctuation marks** Problem to solve: In the above example if 'Prime' is replaced with 'prime' it is treated as a modifier rather than a compound word. Step 2: Named Entity Recognition Now that we have attributes (predicates) and objects (entities), we need to convert them into brain tokens. **Example:** India ---> common/entity/country/1 Prime-Minster ---> /common/predicate/common_person_is_prime_minister_of_common_country Step 3: Creating a brain query Since we have our dependency tree, we can easily use the relations to create a brain query **Example:** Since our object (entity) is 'India' (or common/entity/country/1), we look for its modifiers. Then we look for its attribute (or /common/predicate/common_person_is_prime_minister_of_common_country) and its modifiers. Using these we can make a brain query: (making arango query because that's the database we are currently using) FOR entity IN /common/predicate/common_person_is_prime_minister_of_common_country FILTER entity._to == common/entity/country/1 COLLECT x = entity._from INTO result FOR entity IN result FILTER entity.gender == 'Female' SORT result.start_date LIMIT 1 RETURN entity The relation between subject and object is defined by verbs. To find these relations we need to extract the 'ROOT' of the sentence. Then we nedd to check if there are any attributes availale of the object. If yes, we need to combine them to find the realtion Example: In the above example, since 'is' is the ROOT. By combining it with attribute of the object, We find the relation between 'who' and 'India', i.e., 'is the first female prime-minister of' Our knowledge graph will look something as follows: ![](https://i.imgur.com/0gBdsjE.png) Problem to solve: Find a more streamlined method to do this. We need to convert every element of the knowledge graph we have made so far into a Brain Knowledge Graph. Example: Who ---> common/entitiy/person/? is the first female prime-minster ---> /common/predicate/common_person/is_prime_minister_of/common_country where FILTER gender == female SORT start_date ASC LIMIT 1 India ---> common/entity/country/1 Problem to solve: Need to figure out a way to convert attribute (predicate) into brain_predicate along with conditions. Our knowledge graph should look something like this: ![](https://i.imgur.com/zD61qlg.png) ## Rough Observation Using stanza natural language parsing we can get upos(universal pos), xpos(language specific pos), deprel(dependency relation), head(head of current word) and feats(morphological features). We cant get dependency graph which is head deprel pairs.