# Event Extraction Process
### Oveview and Forge.AI Event Processing Primer
This document describes the event extraction and quality control process. It additionally describes some of the engineering of the event extraction pipeline.
Forge collects and processes open source information in real time from domestic and foreign web sites, news aggregators, social media, etc. The processed data is simultaneously stored and transmitted to customers as it is processed. The collected information is transformed from its source representation, designed for human consumption, into a representation designed for ready analysis and use by machine leaning and other AI environments. There are multiple AI models that operate on the collected information during the transformation journey. Below is a graphical representation of these processes .

> [name=Nick] Actually, machine translation is performed by the collection environment.
This publication describes the event extraction process and production process.
## Events
Real world events are the things that happen. Linguistic events are the representations of the things that happen in language. Events may have a number of possible constituents. We will call these components thematic roles. The set of thematic roles we identify at Forge is illustrated below.
<img src="https://i.imgur.com/bazFaQD.png" alt="Thematic Roles" width="60%"/>
Specific events, such as a merger event, or a lawsuit event, or a power outage are a mapping from the thematic roles to the event specific semantic roles associated with the event. The mapping is performed by an event reasoner, a specific software capability, that uses the event ontologies (OWL axioms) at runtime to reason over the extracted thematic roles, concepts and resolved entities in a specific document to determine create the specific event.
The picture of the the utility disruption event axiom illustrates this mapping.

There are two things to note:
1. The specific event axioms are used for run-time reasoning.
2. The thematic role labeling is done by a trained AI model, a conditional random field model.
*This pairing of runtime semantic reasoning with the trained thematic labeling model allows Forge.AI to introduce new event types without having to train a new model; a long and expensive process. Adding new event classes is mostly a function of creating the approriate OWL axioms that define the mapping between the thematic roles and the event specific semantic roles.*
Events most frequently don't occur at a point in time, rather they most frequently have a time span associated with them. Consider the utility disruption event described above. There was a time where the factory or town lost power, and presumably there will be a time when power is restored. If processing information in real time, the disruption start and end dates will be reported separately.
This raises the topic of event groups.
Extracted events are stored in a database.
### Event Use Cases
1) Event Thematic Role Labeling.
[This is a technical use case that does not involve human interaction.]
All extraction activities occur at document scope. The Forge.AI Core NLP services sequentially process a document through Sentence, POS, Chunk, NER, Complex Entity, Relationship and Thematic Role models. Extracted information is then resolved using the Forge.AI knowledge base. After the extraction and reasoning processes, an event reasoner (software) process the labeled data and creates one or more events. <p>Each thematic event is associated with a document, location/sentence in the document, event type, identified concepts and phrases that are used classify the event, and a set of one or more thematic roles (TR). We will use the terms thematic role and semantic role interchangeably in this document.</p>
2) Event Thematic Role Storage
[This is a technical use case that does not involve human interaction.]
Extracted thematic roles are contained in a PDoc. This event related information is stored in the DocEvents, DocEventSentences and the DocEventThematicRoles tables in the extracted document database according to the schema shown below.

The schema is color coded as follows:
* Yellow tables represent pdoc schema tables that are referenced by the event tables.
* Brown tables represent thematic role information extracted during the runtime processing of a document. This information is intended to be populated by the program that stores processed pdocs in the designated relational database.
* Green tables contain reference data. This reference data includes event type information, possible lexical units for the different event types, the mapping of events to product lines, and the association of product lines with meta information (e.g. basel category).
* Blue tables are populated by the complex event processer discussed below and represent the mapping of thematic roles to specific event types.
* Red tables are used to store the history of events as they are updated either automatically or during human related processes such as QC processes.
3) Complex Event Processing (CEP)
[This is a technical use case that does not involve human interaction.]
Thematic role based events need to be mapped to specific (ontological) event classes prior to use by a customer. Mapping between the thematic role based events is performed by a complex event processing service. This service uses the Forge event ontologies to perform the following tasks:
- Mapping new thematic/semmantic role extracted to its respective event's lexical unit. This is done using the event axioms as illustrated above.
- Updating existing events as necessary by adding newly discovered roles to the event. For example, in a lawsuit event we may discover when porocessing a new document what jurisdiction the lawsuit is in.
- Mapping duplicate events to the same event group.
- Associating events into event groups, where an event group represents a series of related activities. For example a lawsuit may have be represented by an event signaling a lawsuit was filed, multiple events related to litigation and ofers of settlement, and an event reflecting a final settlement.
After the event processing engine has mapped the thematic roles to their specific event types and roles. The events are ready for quality control and associating with specific customers.
4) Quality Control <p>***For discussion***
- Edit an incorrect LU mappings
- Add a LU mappings
- Mark a document as misassigned (nix the doc)
- "Disconnect" events that are associated together as part of a larger, complex event
- Mark a complex event as closed (add end date)
- Add or edit the association of the event with 0...n categories. For example, aligning the event with a basel category of a spcific line of business.
- other?
5) Mapping to Customer Requirements
Associating extracted events with specific customers involves filtering on the event type and the specific event roles. For example a "Utility-Outage-Event" may involve Ford, or a "Lawsuit-Event" may involve another specific company or person that a customers is concerned with.
---
*There is a further consideration that needs to be discussed. That is the association of the event with specific Forge product lines. For example, an Operational Risk Feed consists of events of a certain set of categories, optionaly filtered by customer or some other criteria. The important point is that "operational risk" is not an event category, rather it is a collection of event categories. We can create collections of event categories to represent may things: credit risk, political actions, corporaate actions, crime, etc. It is uncertain what service manages this mapping of events to these categories and how that will be managed.
This could be (should be?) done in the CEP engine. We would need to add a table to the Event DB that contains a 1-to-many mapping of an event to these "productized" event streams. Thoughts on this are welcome.
*