# Event Generation and Management Processes <small> Jack Crowley, 03 May 2019 </small> ### Overview There are four main steps involved in producing Forge events: 1. Thematic role (TR) extraction from natural language 2. Event generation from TRs 3. Quality control (QC) 4. Distribution of event information to customers. The purpose of this document is to describe the states an extracted event moves through during the QC process. A description of each four main steps is provided for context. This document is intended for a general audience, however experience with software engineering, database systems, natural language processing, ontological reasoning and machine learning would be beneficial. ### Forge Event Extraction Process #### Thematic Role Extraction In classical linguistics, thematic roles describe the roles that a noun phrase may play with respect to an action. Classically the thematic roles describe the actions associated with a governing verb. A component of the Forge.AI NLP processing pipeline identifies thematic roles and incorporates them into an structured representation of the document. At Forge, we call this structured representation a "PDoc." A PDoc contains source text, translated text if the source text is in a foreign language, extracted entities, concepts and relationships, extracted thematic roles, and derived information such as document categorization, sentiment, topic vector models, and document summary information. This extracted information is simultanously transmitted as a real-time intelligence stream to customers and persisted in an enterprise data store. Multiple processes continuously run on the enterprise data store to produce high confidence, cross documents, knowledge based products and services. #### Event Generation Sets of thematic roles in the enterprise data store are associated with an anonomous event. The Forge event processor service (called GEM, short for general event model), monitors the data store for new event types. The database schema associated with event production is provided below for reference. The thematic role and anonomous event tables are shown in brown in this database entity relationship diagram (ERD). ![](https://i.imgur.com/fKr3VAm.png) The schema is color coded as follows: * Yellow tables represent pdoc schema tables that are referenced by the event tables. * Brown tables represent thematic role information extracted during the runtime processing of a document. This information is intended to be populated by the program that stores processed pdocs in the designated relational database. * Green tables contain reference data. This reference data includes event type information, possible lexical units for the different event types, the mapping of events to product lines, and the association of product lines with meta information (e.g. basel category). * Blue tables are populated by the complex event processer discussed below and represent the mapping of thematic roles to specific event types. * Red tables are used to store the history of events as they are updated either automatically or during human related proicesses such as QC processes. When GEM identifies a new anonymous event, it is responsible for translating the anonymous event into one or more "instantiated" events. Instantiated events such as a fraud event, or a natural disaster event, or terrorist action, are defined in an event ontology. The event ontology contains the necessary logic (axioms) from translating the thematic roles into the specific event type elements. For example, a fraud event has the *person or company comitting fraud* and a terrorism related event has a *terrorist*. Both the fraudster and the terrorist are initially associated with the same thematic role class (Actor role) during the thematic role extraction process. GEM, using the event ontology, classifies the event appropriately as a fraud event or terrorism event and associates the Agent thematic role with the appropriate event semantic role, defrauding-agent and terrorist-agent respectively. > A note on nomenclature: In this document we will use the terms "*semantic role*" and "*lexical unit*" synonymously. GEM updates the database with event information as summarized below: * A record in the **EventInsts** table, shown in blue, to represent the semantic event. The eventState field in the table will be set to "New". * One record for each semantic role (lexical unit) is inserted into the **EventInstLUs** table shown in blue in the ERD. The LUStateId fields for each of the records will be set to "New". * A record is inserted in the **EventInstDocs** table, shown in blue, to associate the event with the source document. Future versions of GEM will be responsible for updating existing events with new information as it is discovered. This may include associating new lexical units with an existing event based on new reporting and grouping events together into larger structures (e.g. modeling events as narratives.) At this point of processing, the intial thematic role based (anonymous) events have been mapped to specific event types (e.g. Fraud events, Money Laudering Events, System Outage Events, etc.) but they still need to be mapped to product line (PL) event groups. A product line event group is an association of semantic event types to specific product lines. A given semantic event may be associated with multiple event product lines (PLs). For example, a internal fraud event may be asociated with both the Operational Risk product line and the Credit Risk product line simultaneously. Product Lines, represented as the green PLs table in the ERD, include: * Operational Risk Events * Credit Risk Events * etc. Each product line may also have additional specific information associated with it. As an example consider a fraud event. This event may be associated with the Operational Risk product line and the Credit Risk product line. When it is associated with the Operational Risk product line it needs additional, product specific information associated with it, namely BASEL categorization and BUSINESS-LINE data. When this same fraud event is associated with the credit risk product line it does not have these extra fields associated with it. The key take away here is that the same physical event may be associated with different product lines such as Operational Risk PL, Credit Risk PL, National Security, etc, and depending on the associated product line aditional data may need to be associated with the event. The association between an event type and product lines are maintained in the database in the "lookup" tables: **PLs**, **PLEventTypes**, **PLMetaData**, **MetaDataCategories**, and **MetaDataCategoryValues**. GEM will use these tables to map the semantic events to the approriate product lines. For each event that is mapped to a product line event GEM will insert a record in the **PLEventInsts** table represented in blue in the ERD. If the mapping of the product line event to the PL required additional data (e.g. BASEL category) can be made deterministically then GEM will insert this information onto the **PLEventInstsMetadata** table also represented in blue in the ERD. #### Quality Control (QC) Event extraction is one of the most difficult language processing tasks. A capable QC process is required to insure quality prior to transmitting the data to customers. The QC process and interface are designed to allow for maximum flexibility to manage events, associated lexical units (semantic roles), and their associations with product lines. This includes the ability to correct errors on events and LUs, completely reject an event or LU, and add new events and LUs (future?). The event state and the state of the symantic roles (lexical units) define the actions that can occur on the event at any given time. There are three principal actors involved in processing events: GEM, the quality control staff, and the solution that transmits events to the customer and customer facing database. The state transition diagram below illustrates the event states, and the actions taken by actors to transition the event between states. The QC definitions of what qualifies as a good or rejected event or lexical unit are defined in the QC documentation developed in conjunction with the product team. ![](https://i.imgur.com/Tf5MvaF.png) ### Event Distribution and Customer Access Events are transmitted to the customer as a distinct product. They are not part of the customary "external pdoc" schema. Requirements for the serialization of events will be defined in coordination with the product management and engineering teams.