Interview with an Event-Driven Architect

# Interview with an Event-Driven Architect >###### tags: `EDA` > >###### (c) _All Rights Reserved_ > >[name=George Willis] [time=July 25, 2018] --- ### What is Event Driven Architecture (EDA)? Well, Gartner refers to it as a [Copernican Shift](https://www.forbes.com/sites/gartnergroup/2017/08/08/why-business-moments-are-different-in-a-digital-world/), analogous to the historical paradigm shift that occured when we began to think of the Sun as the center of our solar system. I find this an apt description. For 50 years, developers have been taught to execute code using request/response messaging patterns of syncronous invocation. Call stacks exists because a function calls a function which calls another function -- ad nausea. The stack is not cleared of such recursive context until the "ball of mud" gets unravelled, which becomes challenging when remote invocation is used. With remote invocation, the call stack gets spread over many servers, and in the face of network partitions and/or latency, this can lead to slow, brittle systems. This is why many companies now face issues around what I call Quality@Scale. Like security, you have to design with reliability, resilience, availability, and consistency -- @Scale. The [Reactive Manifesto](https://www.reactivemanifesto.org/) does a good job defining the problem space. Said another way, synchronous invocation creates a **transactional model** by ensuring all recursive functions return before the outer call is cleared from the call stack. This made life simple for developers to ensure valid application states (eliminate invariants) -- an "all or nothing" approach. But this solution to the problem of invariants (unexpected states) does not scale well beyond a single host due to the environmental uncertainties of networked clusters. As scale increases, the probability of failure and CAP partitions approaches certainty, and issues of Quality@Scale emerge. What we need is a way to ensure transactional processing without the need to "lock" recursive, distributed call stacks. We need to **"embrace failure"**, not ignore it. This is the historical failure of IT. Part of the Netflix success story is based on "Chaos Monkeys" -- automated failure simulations that drive the creation of automated recovery. The idea of creating a "perfect" system where nothing goes wrong is a fantasy, and not the way organic systems behave. We can learn a lot from biological systems. Nature heals wounds. ### So how is the EDA approach fundamentally different? EDA doesn't try to improve this imperical model of perfection, it replaces explicit invocation with the "Hollywood Principle" -- **"Don't call us. We'll call you."** In EDA, functions are not "called". Rather functions "listen" for business events to occur on a data stream, and autonomously execute. The historical use of function calls as the center of the universe foundationally changes to events as the center of our IT solar system, and that has immense ramifications. I like to contrast the two approaches with metaphors of archery and fishing. In archery, the archer directs the arrow to the target, just like a service gets called. When the service moves, the archer must be revised to shoot at a new target. In fishing, it's the service that places "hooks" into a stream. It's "loosely coupled", and when you decompose macroservices into many more microservices, loose coupling becomes key to inter-service integration. That's why you keep hearing about MicroService EDA architectures. EDA is becoming synonymous with Microservices. ### So what exactly are "events"? Great question, because without a clear understanding of what events are, you can never achieve digital transformation, and Gartner and I agree on this as well. In the simplest terms, an event is "a publication of change". There's an old saying about a choir singing from the same hymnal. If the choir director changes the song, you have to update the hymnals that each singer is using to reflect this "event". In this way, each member of the choir can independently "sing" based on a common record. It scales to very large choirs. Business events are no different. Every organization who has employees hires and fires. When you let somebody go, many independent systems must be updated to reflect this business event, such as payroll, facilities, email, etc. This creates an integration requirement. Rather than drive the order of updates from a central "conductor" (which is known as "orchestration"), each system listens for such events, and executes autonomously and many times in parallel (which is known as "Choreography"). Choreography removes the dependency on a central conductor, which increases throughput and reliability, since conductors can get in the way of performance and become a critical point of failure in centralized designs. Perhaps a thought experiment will help. Imagine you are an admiral who has been stranded on a desert island and one day you see a ship in your fleet on the horizon. You know morse code, so you run to get a mirror out of your shelter to signal the ship. When you get back to the beach, you find the ship is gone -- it has moved. How can you signal a ship you can't see? You need a Copernican Shift in thinking about communication. So you write 10 duplicate notes that outline your current state of affairs -- location in terms of lat/long, days of remaining food, who you are -- and stick the notes in bottles that are cast into the ocean. Within a week, 3 ships from your fleet arrive, each with the same note. One has food, while another has medical supplies, and the third has the best quarters. All three got the same event, but each ship aids in the overall solution in a different way. **The choreography of independant actors from the same event is the natural byproduct of the Hollywood Principle.** ### But if you could have signalled the ship directly with the mirror, they would have come straightway. Wouldn't that be faster than waiting a week? Don't get lost in the metaphor -- we have a much better event delivery systems in IT than the Admiral's floating bottles, and that is actually what is new about EDA today -- **the performant distributed ledger.** EDA has been around a long time -- all the way back to Xerox PARC and Smalltalk MVC, but the use of EDA was mainly confined within a server where events could be published into shared memory, and quickly ingested by one or more listening processes. To scale beyond a single server into a cluster of servers involves network communication, durable storage of the event of each server, recovery from failures, etc. -- things that require some "enlightened engineering" given the lack of shared memory between hosts. This was pretty much the "State of the Union" in 2000, when [Roy Fielding introduced REST](https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm) and wrote about EDA ([P2P](https://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch_styles.htm#sec_3_6)) style architectures. Roy points out the issues around event publication that were constraining this "architectural style" 18 years ago -- but a lot has changed in 18 years. Modern internet-based companies need to handle streams of events at high volumes, and cannot sacrifice quality. Netflix is one such company that exhibits Quality@Scale -- it's like a dialtone in my home -- it just works. Uber has realtime maps showing location of transportation as GPS events stream from each car. Uber is the largest transportation service on the planet -- without owning a single car in it's fleet. Another such company is LinkedIn, and in 2011, they open sourced Apache Kafka. Kafka is optimized for throughput and linear scalability with one mission in mind -- distribute events to a cluster of hosts. They are one of several solutions in the "streaming" space that have matured and are being implemented as the central nervous system for publishing events. The central concept of these streaming data platforms is a distributed, common ledger -- like the duplicate notes in our thought experiment. Each host appends new, "immutable" events to it's local copy of this common, distributed ledger to syncronize state at a cluster level. And that is a very direct solution to the problem of clusters of computers that don't share RAM -- communicate the changes so that all hosts can sing from the same hymnal. It's fundamental. When you add in the other benefits of minimum communications (DRY networking), deterministic publication load, parallel execution of choreographed workflows, and separation of state replication concerns to achieve optimal clustering, you can start to see why **streaming applications set the new benchmark in transaction processing speed.** When you understand that recovery from failure is all about recovery of operational state, then having "backups" as a stream of realtime events that are milliseconds old enables **immediate recovery.** This is the core of **Quality@Scale**, and why some companies are cashing in while others are scratching their heads. ### So besides Kafka, where else is EDA used? Lot's of places you don't even notice -- like a Scrum daily standup. In fact, "Event Thinking" aligns with both Scrum and BPM. At a Scrum standup, each team member broadcasts their updates from the last standup meeting. The updates are events; and when done, you have syncronized the state of the team, and enabled independant processing until the next standup event exchange. And how better to model business processes than to decompose a large workflow into business events that a) capture the state of workflow continuously, b) allow visibility into the progress of work, and c) enable recovery from exceptions. At a more technical level, ETL is EDA, where streams of atomic events are replaced with batches of events that undergo transformations into a new batch of events. In this case, most Enterprises are engaged in EDA without ever realizing it! But the benefits of ETL are clear. Databases have used EDA "transaction logs" (aka distributed ledger) to replicate data changes. Just about every Database of the planet replicates servers this way, and many describe EDA as ["The Database turned inside-out"](https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/). And let's not forget Data Science and systems like Apache Spark or MapR -- they're EDA too! They employ "Data Localization" to reduce the network load of broadcasting events by moving the process to the event(data). Considering the relative size of a typical binary executable compared to an application database, you can see where it is better to bring the processes to the mountain of data. Data Processing is all about getting data and process local at runtime, and it's simply less work to transport process "containers" to the data than the other way around. Compared to data, processes only change when code changes, so the need to communicate changes to code is miniscule compared to ever growing mountains and lakes of data. In this case, the mountain should not come to Mohammed. Data Localization is only one of the many performance advantages offered in modern IT architectures. And finally, you may have heard of BitCoin or other Cybercurrency. Economically speaking, trust is a big part of any currency. We have to be able to trust that the wealth contained in our account will be their tomorrow to trade with. Who wants to buy $1000 of Bitcoin to find out somebody hacked your savings and you now have $50? Bitcoin employs Blockchain, a distributed ledger just like we've been discussing and just like you learned in Accounting 101, with one technical advantage -- the ledger is encrypted to prevent tampering. I mentioned earlier that events are immutable -- just like accounting ledger entries. In internal systems amongst trusted collaborators, the ledger is immutable by convention -- you trust the collaborating hosts under your control and governance. But in B2B, no such control and trust exists, so you need a way to enforce that the ledger can't be tampered with "after the fact". Encryption offers such guarantees. I see futures of incremental encryption that provide simple encryption that get increased over time to more complex encryption via nested encryption to provide both immediate and long-term tampering protection. ### Gartner listed EDA and Blockchain as 2 of the [Top 10 Strategic Technology Trends for 2018](https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trends-for-2018/). That's 20% of the list! Gartner also forecasts that in 2020, EDA will be 1 of the top 3 CIO initiatives. Is this stuff really that great? In a word -- absolutely! I've been through many changes over the last 30 years. From DOS to Windows, from standalone PCs to Networks, from procedural programming to OOP to functional programming, and more recently, from VMs to Docker. This is by far the biggest architectural triumph of Modern Computing, and I don't expect it to be eclipsed in my lifetime. I can't say or do enough to get this point across. It's big, because it's so foundational and pervasive in impact. The Sun is finally at the center of our IT Solar System, and the proper alignment melts the engineering challenges around grid computing, creates optimal processing, and simplifies programming. ### How does EDA simplify things? Some architects solve complexity through more complexity -- until the giant layer cake crumbles. Take SOA for an example. To solve the problem of moving services in SOA we added an ESB, but that added roundtrip latencies, and pushed coupling of endpoint services into ESB configuration, but did nothing to get rid of it. I believe the solution to complexity is -- Simplicity. I also believe complexity is the true enemy of IT, so I like reducing a problem down to the simplest terms. EDA relies on the Hollywood Principle, so you don't "target" endpoints like archers trying to hit moving targets -- you cast events into streams of listeners who cast nets. Think "hooks" over "arrows". If and when implemented corrrectly, **EDA provides the minimum data required to communicate state change -- and nothing more.** It forwards this atomic change **one time.** With network latency being the Achille's Heal of distributed computing, being lean in communications is a fundamental requirement of efficiency and performance of grid computing. EDA allows each host to act independently from it's local ledger -- in isolation from other hosts. Remember that the "I" in ACID stands for Isolation. It greatly simplifies development to scope the environment to an isolated server environment that automagically works in clustered environments -- seemless scaling. We call this approach the [Actor Model](https://en.wikipedia.org/wiki/Actor_model), and it allows a very simple paradigm for MicroService development, which is the second reason you hear a lot about EDA and MicroServices. Actors are simple. Actors listen for events. Actors ingest and process events. Actors emit new events reflecting the results of processing. Actors go back to the table for seconds. Simple and Atomic -- it's Computer 101. If you've heard of Function-as-a-Service (FaaS), you might begin to envision where EDA can run this to the ground, and create a programming model that is easy enough for "Citizen Development". EDA simplifies integration. Any authorized system can join the party and listen in on events. EDA enables Polyglot Persistance. All collaborators write to the common ledger and query from various databases that ingest events, and update sources for query. With varying data structures like graphs, trees, and tables; developers can choose the most performant source of query. **The system of record is the ledger. The systems of query are the databases -- "Database turned inside-out".** These systems of query represent different "projections" or views of the same, consistant truth, and **Command/Query Responsibility Seperation (CQRS)** is the natural byproduct. CQRS may be the most powerful design pattern employed in the scaling model, and it is achieved effortlessly. This is one of the natural consequences of aligning orbits around the Sun. If Actors don't get the data from the event topic, they can query "off-topic" sources to fill in the blanks, and use whatever data structures best support optimal query. If query sources get overworked, it's easy to partition the data into two sources that ingest events in parallel, or simply replicate full content to divide workload. Scaling queries doesn't get any simpler! Lastly, the hallmark simplicity feature of EDA is Software Evolution. Roy speaks to Evolvability and Extensibility in his landmark [dissertation](https://www.ics.uci.edu/~fielding/pubs/dissertation/net_app_arch.htm#sec_2_3). ### Why is software evolution important? Compuer Associates (CA) surveyed Product Owners and found some dirty laundry. Most testify quality and confidence in a system goes down with each release (temporal), and as the utilization grew (physical). In short, system quality "devolves" over time and load. But in EDA, new functionality can "hook" existing events just like new ETL transforms can hook existing batches -- without the threat of impacting current processes. That's the isolation aspect. Extending and evolving systems that maintain current quality expectations is critical to building complex systems. This is the simple part -- operating an EDA Software Factory. It's the same approach we use to build airplanes and pharmaceuticals -- first build the factory, then build the widgets. **The hard part is building the EDA Software Factory that an enterprise can use to Digitally Transform Business Automation holisticly.** ### What are the challenges to EDA? Well I'll use Gartner again, not because they are always right, but because this time they truly are enlightened with concern to EDA. The biggest problem they identify is the knowledge gap around "Event Thinking". >**Event Thinking** is: >1. Completely contrary to 50 years of IT focused on guaranteed integrity of action, and > >2. is a better model of real life where all agreements have some degree of the leap of faith. > >The first, at the core, is why it is so hard for event thinking to take hold. The second is why the digital business success demands it. > >~(https://blogs.gartner.com/yefim_natis/2017/09/12/event-thinking-a-challenge-and-an-imperative/)~ The pieces to the EDA Software Factory are all around us, but currently it's like building a bike with bad instructions. You know you need to complete an ["Event Sourcing"](https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/) architecture, but there's a lot that goes into the Factory, and it's easy to miss some key components that undermine the stability of the system. Many enterprises have home-grown MQ-based solutions that do not exhibit Quality@Scale characteristics, due to these Architectural deficiencies. The issues usually center around defects in Event Sourcing designs. So it takes talented archtitects to build a Factory from these components, and that's why only a few companies can "Cross the Chasm" to achieve Digital Transformation founded on EDA. LinkedIn, Neflix, Uber, Airbnb -- talented architects. ### How are Enterprises supposed to achieve EDA by 2020 given the current "State of the Union"? > ["By 2020, participation in 80% of new business ecosystems will require support for event processing."](https://www.linkedin.com/pulse/shift-from-data-centric-event-centric-thinking-underscores-holt) > That's the question keeping most CIOs up at night. But perhaps the real question is -- "Who should build your EDA Software Factory?" The wisdom of putting "new wine into old wineskins" is just as true as when it was first uttered -- it's the recipe for failure. Remember that building the factory is the rocket science (Infrastructure, Data Science, Security, Networking, DevOps, etc.), but operating the factory is Application Architecture. These are two seperate concerns, two seperate skillsets, and two vastly different levels of expertise. Would you expect a bus driver to build the bus? Even if they said they could, are you going for a ride at 60 mph? Most EDA applications are mission critical -- it's why Quality@Scale issues surface in realms like customer response. The people that build the Factories at Uber, Netflix and LinkedIn are extremely talented! **If you are going to suceed in Digital Transformation then you must create a culture for IT talent to thrive, and the the first "IT transformation" should be an organizational one.** ### How do you align the organization and culture to build an EDA Software Factory? Well, you can try "growing" mechanics from bus drivers, but that takes time, has varying success, and in the end, the factory does not require that level of expertise to operate. One company who tried training 'c' programmers in 'c++' found many could not make the mental leap, while those that could left the firm for more lucrative opportunities. This is an issue many CIOs have and will experience. There are two(2) resource pools that can effectively build software factories -- those that know and those that can be taught. Currently, the demand for Data Science expertise is so great, that [Masters and PhDs in IT can get 2 months of bootcamp training for free!](https://www.switchup.org/bootcamps/the-data-incubator) The industry has identified these as "those that can be taught", because of their demonstrated ability to "ingest knowledge". This is a small pool, and demand exceeds supply. Even smaller is the pool of those that are "thought leaders in EDA". They are the true "Oppenheimers of IT". But beware. The only thing worse than a bad architect is a bad architect with good communication skills -- a Pide Piper that leads IT over a cliff. It takes EDA talent to identify EDA talent, and to sort the wheat from the chaff. ### Well that sounds ominous. What can CIOs do to reduce risks? I once heard the history around the testing of the first US H-bomb over water. Half the nuclear physicists informed POTUS that a chain reaction would occur with the hydrogen in water, and incinerate the entire planet. The other half insisted it would be OK because the bomb would blow it's fuel supply out of range of the reaction and "snuff out". POTUS did not have a PhD in Nuclear Physics. I imagine this as the dilemna of current CIOs. Like POTUS, they must decide which rocket scientist(s) to follow, without the ability to discern the deep science. Who do you trust with the fate of the planet (or enterprise automation)? Ultimately, POTUS asked what Einstein thought, and the test went forward. Trusting proven talent proved to be a good choice. Like Dr. Phil says -- "The best indicator of future behavior/results is past behavior/results." Which such sound reasoning, I find it counter-intuitive for the same architects who designed systems that are not reliable to be given free reigns a second time. We need new wine skin in the form of proven talent. > CIOs need to invest in vetting EDA leadership to identify proven talent. They need to leverage the best "Event Thinking" resources they can obtain -- both internal and external -- to establish durable leadership of an EDA practice. They then need to empower that talent to grow the EDA resource pool, and build the EDA Software Factory. CIOs also need to understand that such talent is in high demand, and will continue to be. This requires a balance between those who know and those who can learn EDA to reduce labor costs; and generate small, talented EDA practices. > **It's not about big teams, it's about big talent!** ### I hear a lot about the "API Economy" and REST. What applications shoud be built with EDA, and which with REST? This is question constructed on false precepts, but I hear it a lot. It's evidence of the lack of EDA thought leadership (what Gartner calls "Event Thinking"). It's a technical discussion, so I'll just give an overview. Simply stated, it's not a matter of "which application", but rather a matter of "which operation" and CQRS. It takes a grounded understanding of the CAP theorem to fully appreciate. Commands change state -- they mutate data -- they write. Without EDA, ACID DBs ensure consistency with locks because reads and writes are happening concurrently. It's the wild west. In EDA, we write to a common ledger, and the write to the ledger "triggers" consumption (reading) of the event. We term this "Causal Consistency". Events "drive" the process, so reads of events happen after writes -- reads are "caused" by the writes. This brings serialized order to processing, and avoids all that locking required in "disorderly concurrency". Using Req/Res for Commands has the Call Stack issues I have noted -- so requiring every Command to return results backwards is not efficient. Passing results forward in assembly line (bucket brigade) fashion is much leaner. That's streaming. Towns in old westerns would have burnt down if each pale of water had to go through the hands of the Fire Chief! REST prescribes req/res for Commands, so all your CRUD operations that change data -- that's the problem. Queries are a differnet story. Req/Res is a natural for this messaging pattern. You don't know something, you make a request, you get what you needed. Queries just fill in missing pieces of the data fabric. Of course, as long as information is being sent over a network, you might as well update all host caches that may soon need the same data, so modifying the response to be a broadcast has efficiencies as well. ### How did we get where we are today? How does EDA impact REST and the API Economy? As I said, Queries are naturally a request/response pattern. Since 80% of typical applications are query-centric, it's not surprising we overloaded the pattern and started performing Commands/writes with request/response patterns. > Commands "push" topical data to listeners. Queries "pull" off-topic data. In cases where REST APIs exist, they can be leveraged in EDA through a listener that "dispathes" calls to the API, leveraging Req/Res APIs within an EDA framework easily. That said, you still face the scaling issues of tightly coupled, centralized invocation using Req/Res. In short, it's easy, but with none of the EDA benefits. It is often the only choice when consuming APIs that are not eventful -- such as third party APIs. Outside of Req/Res, the remaining principles of REST and the "API Economy" are fabulous. (I'm especially fond of the "R" in REST -- "Representational".) The challenge now is to lift those principles into our new understanding of the nature of the Solar System, and find homes for API specifications like Swagger that move us closer to Citizen Development and Declaritive Programming. Events form Evolutionary APIs with interfaces defined by events ingested and emitted, just like typed parameters and returns. It's all inputs and outputs at an conceptual level. This means Swagger and OpenAPI standards can be adapted for these Actor APIs, and achieve their purpose.