Why EDA? - HackMD

--- title: Why EDA? --- # Why use Event-Driven Archtecture (EDA)? > ###### tags: `OpenEMIT` `EDA` > [color=blue] We have a dangerous misconception in distributed computing. We started doing something that was fundametally wrong, and we haven't stopped -- like lemmings off a cliff. Let me explain. All modern day processors require RAM or "cache" to process code. We divide or "map" RAM into 3 regions: 1. Static or Global Variables that live during the lifetime of a process 2. The "Heap", where relatively large things of an indescriminite size are temporarily stored 3. The "Stack", which is like a stack of plates used for functions within processes to "call" to and receive results from other functions. Let's focus on the (call) stack. ## The Stack Functions have a "scope" of memory -- input parameters, output results, and variables used during processing. It's all data, and it's all required during the function's lifetime to "execute". Let's term this the function's "context". But what happens when functions call functions? Well, the same thing that happens when your boss calls you on Monday at 7:30am. Whatever you were doing get's "tucked away", and you get busy with the new task -- the called function. After the called function completes, it's plate is removed from the top of the stack, leaving the caller context at the top. We term this "pushing and popping from the stack". It denotes a "last in, first out (LIFO)" data stucture. Stacks live on each host (networked computer) to perform a) calling functions and b) "time-slicing" a CPU core when the Operating System (OS) scheduler want to "set aside" a process context to let another process share the core. We have significant investment in this host-level model. But what happens when we make a remote function call to distribute processing to other hosts in a cluster? ## The Distributed Call Stack It turns out, this model starts to break down when functions call functions on other processors over the network. We end up with a "Distributed Call Stack", where processors are waiting for other processors to finish and return before they can proceed (aka "blocked"). We term this waiting "Synchrounous Invocation", where processes have the CPU, and are essentially "idling" on a CPU core that other processes are dying to use, while they are waiting on a called function to complete. In distributed processing, the network is "snailmail" compared to RAM and RAM Cache, making the wait longer -- the waste bigger. This is a huge obstacle to "density", or the ability of a hosts on the network to process more code. Clusters have to be larger to handle the workload, driving up costs of software, hardware, networking -- all clustered resources. Remote function calls increase the likelihood of waste, as data moves slowly across the network, and context stacks up quickly. ## Availability Whether you call it fault tolerance, reliability, recovery, resilience, 5-nines or self-healing, it's all about uptime or "availability". If enough functions are stacked up on a processor you can "blow the stack", resulting in the host restart. All other hosts waiting for a remote function call to complete are "blocked" as well, so after multiple retries to call the remote function over the network, they finally give up. It's the worst waste of all because timeouts are much larger when expecting results over the network, and this happens several times before the caller finally either recovers from the block gracefully, or crashes in a cascade failure like a rolling power outage. ## Analogies Imagine that everyone in your home tied one end of a reel of yarn to their foot on a friday night before bed. The reel just spools out yarn, and is mounted on the footboard. Saturday comes, and everyone starts walking around. They move from room to room to room. Two hours pass, and nobody went back to bed. Now imagine how your home looks. That is what a distributed call stack looks like on a slow thing called a network. Now ask yourself -- how many programs are running on a typical cluster? That's a lot yarn. If a stack blows, then cleaning up the yarn in your house becomes an applications issue. In essence, all remote function call must contain error handling for this kind of misbahavior, and that's a lot to ask out of an application's developer. Here's another analogy. In the old westerns, when a fire occured in town, they would form a line of people from the well, to the building on fire. They would start with a pile of buckets, fill each with water, and each person would pass along the buckets to the next, with the kids running buckets back to the well. That's teamwork, with every resource working. Now imagine the same scenario, but only one bucket. People watching idly as this single bucket gets handed down by forty townfolks, and then returned in like manner. That is exactly what happens when functions call functions call functions. You just can't have an efficient bucket brigade with a single bucket. You can do it, but it's dysfunctional. In processing, we can make and destroy buckets of memory whenever we like. We can make them at the well, and we can destroy them when they are empty. Buckets should only go one way -- towards the fire with water in them, and you can make sure there are enough buckets that everyone is working. ***That's density -- putting all the available resources to task!*** ## Minimize Network Communications What water do we put in these buckets? Well that would be a multi-function or workflow "context" (data) needed to perform the process, **and the best data to put in the buckets is -- as little as possible, since the network is a bottleneck.** We want to be concise, and [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) up network communication. Say as little as you can. ***Clusters will run much quicker if we talk less and do more.*** ## Publish/Subscribe A better model is to "Publish" the context from the last function to one or more functions that "Subscribe" to the publication because they want to continue the workflow/process. Let's consider a basic Enterprise Business Process -- "Onboard New Employee" When you start a new job, you need a computer, a work location, a badge, a phone, 3 HR meetings, an email account, rights to certain data, a profile, software, an expense account, a mobile phone, IT Security training, other compliance training, etc, etc, etc. How many of these are dependent on each other? Look closely. If I had the leaders of each team in a room and said "We have a new CEO starting on Monday", how fast would the leaders divide and conquer on parallel tasks? Businesses inherently have processes or "workflows", and workflows inherently have certain tasks that can be done at the same time. Synchronous Processing is the opposite of Parallel Processing, so if you are trying to model a workflow on a cluster -- choosing an inherently non-parallel approach to invocation is a step in the wrong direction. What data do I need to share with the leader? Well it would vary for each leader, but cumulatively would include a name, contact info, corporate title, start date, IRS Form i9, IRS Form w4, beneficiaries, etc. And if I want to back to get back to other tasks, I could put it all in folder with a sticky note on it that says "Onboard New Employee". I don't want to watch and wait for them to work, that would be wasted time. They can go forward independently and notify me as things progress. The folder and sticky note constitute an "event". We subscribe to events on the Employee.New topic, and we "react" when we get a new event. Each publisher effectively "broadcasts" an event to one or more subscribers on a topic, and then is free to do more work. This is Event-Driven Architecture (EDA), and the density and native business process support is at the heart of Transformation. ## Summary If we are going to maximize the ROI of IT, we must "sweat the assets" and get more benefit out of what we already have. Event-Driven Architecture (EDA) can transform IT to achieve infrastructure density through efficiency and development velocity through simplicity. Finally, a Cost Reduction strategy that does not mean achieving less! This is the value of EDA -- the "why". What remains is the "how". To achieve the simplicity of development and cost reduction goals of EDA, interoperability amongst technology vendors needs to be achieved utilizing a simple, open API. This creates a competitive marketplace for EDA. **This is the purpose of [OpenEMIT](https://) -- Open Event Machinery for Information Transformation.** When you are ready to begin true transformation -- [contact Reconcev](https://), and [join the OpenEMIT Alliance](https://).