--- tags: OpenEMIT --- # {OpenEMIT|Architecture} {%hackmd p6TgTK-8SlKQzxLCsSEH9g %} * Topic * **topic.pub(datastruct)** *// publish a msg, payload, event, dataTree, context, blob,etc.* * **topic.sub(topicMask, agent, actorInvocation)** *// Agents work on behalf of Actors to provide "fast reject" capability and standardized filtering. Agents invoke Actors.* * Ledger * **ledger.host(topic, parentTopic)** *// Ledgers host Topics* * **ledger=Ledger::new(scope, name)** *// Host a Ledger within a broadcast Scope* * Event * **Timestamp{}** *// For causal consistency* * **SeqID{}** *// For integrity and optimal ordering* * **enum LifeCycle { Submitted, Committed, Published }** * **Topics[]** *// a set of Topics to Publish on* * **MetaData{}** *// aka Head in HTML* * **PrivateData{}** *// Stored in Secrets Repo with **anonymity capability*** * **EventData{}** *// the "event"* * **event=Event::new(metaData, privateData, eventData)** ## The Solution * Ability to measure stress and performance using a standardized test suite > *A "write once, run anywhere" > ++Domain Specific Open API++ that > **simplifies development**, > **solves application portability** and > **maximizes infrastructure density** > for Streaming and DLT Applications.* ## The Problems ### Every Streaming and Distributed Storage Technology has its own API. > If we want to migrate our applications to different streaming technology or clouds, brace for a rewrite. This makes teams "skittish" to start coding in a proprietary streaming API, knowing that there is a "Vendor lock in" issue. It requires making a commitment, since changing horses has a steep price. > > What we need is a saddle that fits lots of different horses, so you can saddle up whatever technology we want. ### Coding streaming apps and DLT is hard and expensive. Given that it is "fundamentally simple" to Ingest -> Transform -> Emit, why is this so? > Events have simple lifecycles. You create, store and publish a "New Employee" event, and Subscription Agents "listen" for events they care about, and invoke a subscribed process. That's it. It's basic Pub/Sub. > > Once the subscribed process has the Event, it has the Event Data as input, and can create a result Event from either a simple ETL-style transformation or a more complex process. > > I have to admit I don't have a good answer as to why an Open API for an Event Pub/Sub over Ledger does not already exist, only that we desparately need it. ### 90% of Network traffic is repeated data. > Compared to CPU speeds, memory is slower, drives even slower, and networks are the slowest. That's just the physics of transmission distances. We need to [**DRY**](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) up our networks, and understand the difference between bandwidth and latency. Latency is a distributed processing wall, so we need to use our network backbones more conservatively. > > Service brokers seek to DRY up networks through execution caching, but they are an afterthought to an a design that is devoid of them. > > Event Ledgers "emit" an event one time to all subscribers, and can take full advantage of network broadcast protocols. > > Send it one time over the network while it's in the highest CPU cache level. That way you don't suffer from "cache thrash" to reload it later at each and every request. ### We keep moving the data to the code, and not the other way around. > Every experienced developer knows that in comparing the application code to the application data, the data is a giant. So why do we keep moving data to the process instead of moving the process to the data? > > This is where Data Science plays a key role in storing the data redundantly for safety, yet partitioned for scaling. And how do processes get their "Physical Locality" to data -- Containers!. Containers deploy code swiftly, and code does not change a lot, so DRY networking is promoted. ### We keep yanking the microphone from our processes. Just when the get the process into cache, we interupt the process and hand the mic to another process. We call this "Preemptive Multitasking". The result is CPU "cache thrashing". > We have more speed limits than network latency. CPUs can only go so fast, so when we hit that wall, we all got more cores (CPUs) inside the same chip. > > But none of that made memory faster. L1 clocks in a half the CPU speed. L2 is half of L1, but you get more. So it goes with L3, DRAM -- slower, but more storage. > > So when the process is in the inner sanctum of the active process on the core, it's not a good to to interrupt its cached state and tear it all down, only to rebuild on the next cycle. "Run the dang ball!" The trick is that each "cycle" is a small transformation that yields a new event(s). #### Quick Definitions * A ***++Distributed Ledger Technology (DLT)++*** is a technology that enables a shared host or network level resource known as a **++Ledger++** (aka log). * A ***++Ledger++*** is a write once, append only, immutable entry **ordered journal of record** that is shared within a **++Scope++**. A ***++Ledger++*** is shared through a **++Publish/Subscribe (Pub/Sub++)** service API. * A ***++Scope++*** is a processing boundary or ***"range of publication"*** concept. Defined scopes are: * **Process** (thread-shared) * **Host** (sockets) * **Cluster** (subnet or group of hosts) * **Datacenter** (geographical premise) * **Cloud** (IaaS provider) * **Enterprise** * **Federation** (shared across legal entities and ***DLT SmartContracts***) * A ***++Publish/Subscribe (Pub/Sub)++*** service provides a ***Serverless*** or ***Function-as-a-Service (FaaS)*** **[Implicit Invocation](https://en.wikipedia.org/wiki/Implicit_invocation)** service that is at the heart of true **Digital Transformation (DX).** > Meet ***"Emit"*** (prononced em-mit), our wicked fast Pub/Sub broadcaster. **Emit** recognizes that a topic with a single subscriber is by definition a "direct call" (DAG optimization or "short circuiting"). * An ***Event Ledger*** is a **Pub/Sub Ledger of Event Entries** (see Event Sourcing and Event-Driven Architecture). **An Event Ledger is a "++foundational architecture++" for DX** that provides: * Service Location Independence via **[Implicit Invocation](https://en.wikipedia.org/wiki/Implicit_invocation)** * **Density:** CPU cache stabilization/optimization through [non-preemptive threading](https://stackoverflow.com/questions/4147221/preemptive-threads-vs-non-preemptive-threads) of transformation * **"*Low Code" Development*** via [***"Xformation Driven Development (XDD)"***](https://www.tutorialkart.com/wp-content/uploads/2018/02/Mapping-3-1024x447.png) * **Evolutionary Development** of "Application Pipelines" * High Reuse of existing pipelines * Theorhetically, zero impact to existing pipelines (Regression Testing) * **Density:** [Embarrassingly Parallel Processing](https://en.wikipedia.org/wiki/Embarrassingly_parallel) of transformations over multicore CPUs. * **Streaming *Kappa Architecture*** *