--- title: Advanced Distributed System Design tags: learning --- ###### tags: `learning` `architecture` `DDD` `eventsourcing` `CQRS` ==TODO== * [ ] :bulb: _follow a Docker kata?_ => https://github.com/praqma-training/docker-katas # Advanced Distributed System Design (https://learn.particular.net/courses/take/adsd-online-free) ## Systems are not applications * An **application** has a single executable and runs on a single machine * e.g MS Word, MS Powerpoint, storing data on your local drive * Usually has a single source of information * Applications don’t know about **“connectivity”** * A **system** can be made up of multiple executable elements, possibly running on multiple machines * most of our systems are distributed systems * e.g. a simple web-app is made at least by 3 distributed elements: * a browser (one machine) * a server process (another machine) * a database server (yet another machine) * Usually has multiple sources of information * All systems must deal with **“connectivity”** * what should I do when it does not respond? * what should I do when it does take too much time to respond? * Each executable within a system is not an application * Each executable must deal with “connectivity" A common model (either in OO or functional languages) is to wrap the remote elements we communicate to into a sort of proxy **to make a remote call looks like a regular call** (hiding the network... => abstraction leak) . That's part of where the problems of distributed systems start: ==**trying to abstract away the network**==! ## "The 8 fallacies of distributed computing" ### 1. **The network is reliable** :relaxed: ```csharp= var svc = new MyService(); var result = svc.Process(data); ``` a lot all stuff happens here: marshaling, serializing, dispatching, deserializing AND a remote call behind the scene! How do I test it? > It works on my machine... mmmm... :thinking_face: what about *timeouts*? * did the call arrive to the server? :thinking_face: * is the server taking too long to respond? :thinking_face: *solution*: we throw the error in the face of the user... :palm_tree: *solution*: let's retry 3 times... :scream: #### Synchronous vs Asynchronous * asynch => there's no user waiting, I can store & forward * synch => user is waiting for a result (request/response model) **request/response model** assume that for each request there will be a response or the system is not available. :thinking_face: what about transactionality? **Message Queues** (MQ) to the rescue! Designed to solve these type of retry & ack, store & forward problems. But we loose the request/response model, which is part of the "synchronous" approach. *"Network cannot be reliable"* is a fact! So, the question is: *can we build a reliable system on top of an unreliable network?* Yes, :heavy_exclamation_mark: but we have to build it differently than the traditional request / response pattern. We will see a lot less request / response across the network, instead we'll redesign a lot of boundaries of our system in order to have more **QUEUE-based delivery model** => this bring reliablility The first rule of distributed object by Martin Fowler is ==**"don't distribute your objects"**== ### 2. **Latency isn’t a problem** :face_with_one_eyebrow_raised: What is **latency**? it's the **time to cross the network in one direction**, from the client to the server. It's not round-trip time, because it involves also serialize / deserialize / computation time on the server, which really has nothing to do with network latency... Latency may be seem almost zero when running things in your own machine, but Small for a LAN, but for WAN & internet can be large!! => Many times slower than in-memory access... Hiding the remote invocations is nice but has drawbacks! **DI hides the latency** (this call will be a local call? a remote call? the different is in many orders of magnitude!) Lots and lots of remote calls are bad! Message Queuing is not really a "remote call", the metaphor is more like a post office where you go and drop a letter for someone vs taking a phone and calling a person. It's less coupling in the former. ORM and lazy-loading is another form of *"hidden remote call"*. **Solutions?** Don't cross the network if you don't have to. If you have to cross the network, take all the data you might need with you. Bandwidth fallacy plays a balancing effect against latency fallacy ### 3. Bandwidth isn’t a problem :face_with_head_bandage: Latency is easy to measure, bandwidth is much harder to measure. Actual standard: * Gigabit Ethernet = 1000 mbps => 128 MByte per second. * TCP then eat about half of this due to its protocol overhead. => 68 MB * Then it comes the serialization (JSON, XML, protocolbuf) => eat another half! => 32 MB * Then you have more server competing on the same bandwidth. => more TCP packet collisions => more retransmission => more **latency** (#2) => more timeouts! => the network is unreliable... (#1). Bandwidth did not evolve as much as CPU speed and DATA storage did. > Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway > –Andrew Tanenbaum, 1981 **Bandwidth vs Latency** * to preserve bandwidth => load the minimum amour of data possible (lazy load) * to reduce latency => prefetch all what you can (eagerly load) *Solution:* This conflicting forces of bandwidth and latency is an incentive that push us to decompose our model into multiple smaller more specific submodules (subset of scenarios) where you can eagerly fetch. *Solution:* Move time-critical / performance-critical data to separate dedicated networks. You can have more than one network for a server (more network cards or virtual networks) => so create high priority networks with more network bandwidth allocated, and low priority networks with less network bandwidth. Example: when you have a single service layer, deployed monolithically (it has just a single API) => no network bandwidth negotiation: the first coming is the first that get served. => split the APIs in two (no big rewriting for this) and map the network ports to different virtual networks with different priorities So, this logical business decomposition (the one that bring you to split the APIs) becomes also a physical network decomposition and get benefits from it. The amount of available bandwidth network is limited (25-40 MByte per seconds): how do we allocate this to give priority to most important business tasks? **ENTITY CENTRIC => PRIORITY CENTRIC** :thinking_face: Doubt: Is this really convenient? With decomposition, we creates more moving parts (split APIs into many APIs potentially deployed to different machine...) => more work => more teams. :smirk: Yes: We're not making the problem disappear, we're moving it (that's happens when you deal with physics, e.g. see the leverage example)! :smirk: Yes: we could host the two APIs into the same host, without even moving them to separate networks. This is a small step that would give us options! We do not make the problem disappear, we instead move the problem to a place where you have more leverage (solve more parts of a problem at a lower cost). #### Logical vs physical architecture Also, doing the logical decomposition *now* does not require us to *ALSO* deploy it into separate networks (physical decomposition). The question we have to ask ourselves is: _are we talking about logical architecture or physical architecture?_ :::info **Logical and physical do not have to be one to one**: you could still be doing a "monolithic deployment" of your logically-separated modules into the same machine. ::: A better strategy would be to **focus first on a logical decoupling of your system**, even in advance of physically separated deployments. And when you decide to separate deployments physically, you could still start hosting those different deployment units on the same machine before going "distributed". So, **a logical decomposition does not require also a physical decomposition** (that one could come later), but this would give us the **option** for a physical decomposition *later* (or for buying more network :moneybag:) :bulb: So, essentially **subdividing things logically give us options, while design things monolithically does not.** We start by creating options, and then when the complexity, we have the ability to use them! When you move from an **"RPC-based model"** to a **Messaging-queue model**, you basically decompose your "monolithic" API into separate concrete **messages** (e.g. `GenerateReportCommand`, `CheckFraudCommand`, ...) and each command could then be **routed** into a different **queue** and each queue could be hosted into a separate **network**. * With an **RPC-based model** the single unit of routing is the single API => you need a higher lever decomposition * With a **Queue-based model** the single unit of routing is the single message. ### 4. The network is secure :the_horns: Every machine connected to a network is not secure :-) ### 5. The topology won’t change :mouse: What do you do when you cannot longer access something that you expect was there (but it's no more because of a change in topology)? * => timeout exceptions! * => a sync and blocking invocation could hang up a thread in our thread pool (for 30 seconds by default!) => waste of resources!! *Solutions?* * don't hard-code IP addresses :D (if you can afford it) * discovery mechanisms (apps that configure themselves at startup) - fragile with a complex system. Do a **performance test EARLY** in the project lifetime: at 30% of a project you should have enough of functionality and architecture in order to run a performance test! ### 6. The administrator will know what to do :adult: The more complex the distributed system is, the more moving parts you have, the more it's likely there's NOT a single person that knows how all works together => document the system and invest in a configuration management. "Startups: Inventing the car while you're trying to drive down the road" #### High availability: why upgrade==downtime for the business? dev: _"Administrator will take care of that..."_ me: _NO"_ The business says to us * _"no, no, don't touch anything, Black Friday is coming"_ * _"no, no, don't touch anything, the CEO is gonna speak to a conference"_ * _"no, no, don't touch anything, we've just published a marketing campaign"_ and what they're thinking is _"...any time the devs or ops person touch the system, something is going to break..."_. this fear push to _bigger upgrades_ => and this is a positive feedback loop => the more code we write, the more changes that goes into a single deploy, the higher the risk that something goes wrong, and loop... => **continuous deployment** and **automation** to the rescue. Having an "high available" architecture is not enough to have high availability of the whole system, you could have a high available architecture but have downtime when you deploy a new version of your software. _Solution:_ => We have to make sure that the code we deploy is **backward compatible**: being able to run version `n` on one server and version `n+1` on another server in parallel, side by side, live! If you can do that, you can reach high availability! **Backward compatibility but you need it in order to have a real high available system!** Make smaller changes to more loosely coupled pieces of code. _Solution:_ Enable the admin to **take parts of the system down for maintenance without adversely affecting the rest**! => **Queue** helps here: one component keeps on queuing messages on the queue while the other component may not be consuming those messages. _Solution:_ Consider how to pinpoint problems in production. A little bit of logging can be helpful, but too much can be harmful! When you cannot find the issue looking at a log, what do you do? Add more logs!! :scream: Focus more on the problem of finding the cause of the problem instead of being able to generate and process more and more logs. ### 7. Transport cost isn’t a problem :money_mouth_face: Serialization and deserialization takes a lot of CPU time!! It's not happening in code that we measure (happens in libraries we don't profile usually). Cloud is cool because at the end of the month its billings says to you "this is how stupid you were in designing your system". A bad architecture may also work but be not affordable on the cost side => transport cost may affect you here! ### 8. The network is homogeneous :restroom: Many moving parts today... but... 10 years ago: Java and DotNet ruled the world. And they basically could inter-operate. Then: DECENTRALIZATION => ruby, nosql (aka "none of your reporting system would not work on top of it"), REST, HTTP and JSON, and yadda yadda ... => the integration problems between these different techs will give us work for the next 20 years! **Semantic interoperability** and the story of an hospital patient care system NULL => the semantics of the two systems are not aligned => and fails in production (it pops out under real use). #### summary * Fred Brooks book -> "Mythical man-month" * In a distributed system we attempt to model the necessary (intrinsic) complexity in the domain, and to make it explicit, to minimize the accidental complexity. * In a monolith system there could be no explicit model of the complexity, so it's more difficult to know when I'm crossing the boundary between necessary and accidental complexity => lot more internal complexity => hard to understand => because there isn't a model, a picture. ## Plus 3 more fallacies of enterprise computing: ### 9. The system is atomic/monolithic :monorail: **Physical distribution vs logical distribution** We tend to think of a big ball of mud as a monolithic application deployed on a single machine. But that's not the only "monolith"! A common "big ball of mud" is a "physically-distributed" monolith (aka **"a distributed monolith"**): still logically tightly coupled, but we choose to deploy it to multiple machines, with a "physically distributed deployment" => it's still a logical "monolith"! **The logical elements of the big ball of muds** are those which hurts us, regardless the system being **"distributed"** on more machines or **"centralized"** into a single machine. Here we're talking about a "logical" big ball of muds. Maintenance is hard in "big balls of mud": a change here breaks something there. Unit tests pass, integration tests pass, but something breaks somewhere eventually. #### Integration through the DB creates coupling One of the forces that push a system towards becoming a monolith is **integration though the DATABASE** => depending on a commons structure in a database creates an **implicit logical coupling** between different pieces of code. We go live and something somewhere depending on your database structure breaks! #### If the system wasn't designed to scale out to multiple machines, doing so may actually hurt performance More machine all hammering on the same database. Stateless-service model => the database become the bottleneck (and you cannot scale out the database). solutions: * internal loose coupling * modularize ...yes, but how? where are the boundaries of the system? the rest of the course will spend many time on these, because it's difficult. ### 10. The system is finished :chart_with_downwards_trend: (from Project to Product) What is the typical lifecycle of a "project"? ![](https://i.imgur.com/XoVmMLu.png) At a point in time someone says "_yes! the system is finished!_" (and this is where the consultants say "_so long!_" and fly aways). And the system goes in _"maintenance mode"_. NO! It's not actually in maintenance mode! There are * still bugs * still architecture issues * still new features that the business wants * still capabilities problems It's **not really DONE**! => this is when the actual works STARTS! Maintenance or original development: * where is more difficult and skill-requiring to *fix bug*? * where is more difficult to *add new features*? * where is more difficult to *deploy new versions*? A system in "maintenance-mode" requires a great deal of skills compared to the one needed in the original development phase. Then, why the "senior" people is put in the initial phase and then the maintenance is left to junior? And we cannot do the inverse either... => That's because this is a FALSE dichotomy! And the whole idea of **designing a system so that "it will be maintainable by somebody else (maybe less skilled)" never worked!!** ![The system in never finished](https://i.imgur.com/VWXaYw3.png) The system in **never "finished"**. The software is meant to **always evolve**. The *"construction+maintenance"* metaphor taken from the building and manufacturing is BAD! Construction is very different to maintenance when building a house. While **building software and maintaining software are fundamentally the same activities**! (it requires the same kind of activities from the team and the stakeholders). Projects are a poor model for software development. We need a better metaphor! **Project ===> Product** The product is a better metaphor. Creating a product is a long term initiative (e.g. _the first iphone will be so perfect that we won't need working on it after the first release, we'll just go in maintenance mode..._ naaah!) The product could live almost forever, while a single feature could be handled as a project. => **we need to design for long-term viability** Adopt a "product" mentality. * we need to have a team that is able to work together in the long-term so that the knowledge is retained by the team (even if some member can change) * no quick and dirty A bad experience with projects is needed to understand that we need a product mentality (because people and orgs change because of a pain). #### A better development process? The business does not actually give us requirements, they're usually much better at coming with **WORKAROUNDS**. So we need a strong profile of IT **Business Analyst**, with enough "power" and political connections to push back the business and say _"I understand that this could work for you, but let's talk about your needs so that we'll find a good solution for your problem..."._ * => focus on *what is the actual problem?* * understanding what problem the business is trying to solve **Rapid prototyping** to see if the problem would be solved. We don't need "stenographers" who just write down what the business is asking to then bring that hard-coded decision to the team. We need a profession here. **It's important to understand the WHYs!** ![It's important to understand the WHYs!](https://i.imgur.com/rnzd4wM.png) Once we have the WHYS and the "build the right thing" taken care by the Business Analyst, there's another role: the **IT Architect**. The IT Architect should help in estimating the "right thing to build", should help in telling which prototype could be doable. Also, the estimate process should tell roughly **HOW** we're going to build the thing. A correct estimate process: > **Given a well-formed team of a certain size S** _(this is a group of people that has worked together in the past, not a bunch of people assembled somehow...)_ > **that is not working on anything else** _(otherwise the estimate is meaningless...)_ > _then we can start to have a conversation of the estimate:_ > **I'm C% confident** _(not a promise)_ **work will take between T1 and T2** promise != possibility (`%` and ranges) This allows the business stakeholders to make decisions. If the estimate is too broad, let me do a POC for say a week... at the end of the POC I'll give a new estimate with a higher confidence. Is this enough confidence for the business? If it's not, let's iterate over the POC to come with a higher degree of confidence and a narrower range of time. ![](https://i.imgur.com/QMktEBG.png) At the end of this process of communication between IT and business, you have a "requirement" and an "estimate", meaning that the business has an idea of the value that they're going to take from this and an idea of the costs they will have to pay. A new role is coming now, the **Project Manager**, who will negotiate with the business which requirements in the team backlog will be re-prioritized and pushed later to have our feature scheduled for development when the business need it. ![](https://i.imgur.com/MzwBJFX.png) Sounds not agile? These are just tools that helps in the communication between business and IT, for a joint and explicit decision-making process. What about the Product Owners? The Product Owner in SCRUM should decide "what need to be done", but often the PO is just a glorified business analyst: * without the necessary backbone to stand-up to the business part of the organization and * without the "rigor" of a process that goes from an idea of the business to a schedule of what should be built NoEstimates and other "no-methodologies": it's not that important what they say NO to, but what they say YES to, what they propose as an alternative to solve the same issue solved with the tool/process they say NO to. NoEstimates need a mature organization to work well. > ==**_Personal considerations_**==: to me the "IT Architect" role should be played by the whole team, not by an external person. The team should participate to the process of evaluating the prototypes and estimates their costs. I know that this means that the team has to stop working on the actual items in their TODO list, but that is better than having a list of things to estimate thrown at the team. ### 11. Business logic can and should be centralized :snow_capped_mountain: aka _"Reuse is good!"_. What happen when the business rules we codified in our software change? If the code of a functionality is **centralized** just in one place, we make the change to that place and we're done. If the code is scattered in multiple places, * we have to retype that functionality in all the places * ...and we might forget one! So, as an industry, we strive to create ==**reusable patters**== (e.g. DRY), that was our solution. * *Entities*, with all the core business logic * *Application Services*, function layer, built on top of Entities * *Use Cases*, that operate on top of those functions to implement a full-flow of a service. ![](https://i.imgur.com/qyPZAgl.png) good or bad design? * too many lines! * too many dependencies The problem is that ==**centralizations and reuse begets coupling**==: the more a code is generic and reusable, the more usages will have, all depending upon it. Arguably this generic (and more complex) reusable code will need to change and when it changes may break many things around. :::success **Reusability => high amount of use => high amount of dependencies => more tight coupling** ::: So we create more coupling under this idea of _"what happens when the rules change?!"_ It's almost impossible to design code to be so flexible for any type of rule changes. How do resolve this issue? * some logic will be physical distributed, installed on multiple machines, e.g. frontend client, backend server, database. * but we can still use a "centralized" development view of our sw Architecture requires more that one view to describe it. One of these view is **the "development view" in our source control system** (like git): this view allow us to "tag" source code by feature implementation (e.g. each feature has its Trello card id and the commit message may refer to that id, or like in GitHub where you have issues that refers to changes in code). If you can find easily which places we need to change in order to change a feature, we maybe do not need to reuse so much (because reusability is the response to the question: how can we be sure to have changed all the code in all the multiple places where is duplicated?). :::info The one place where to find what files need to be changed is the source control repository, so we get the benefit of reuse _(go in one place and see what need to be changed)_ without creating coupling in our design. ::: So essentially ==we create centralization in the development view of my architecture rather than in our class diagrams.== When we have many "single purpose code" => we have some amount of duplication => solution in the past was REUSE. * What if we did not create those REUSABLE code? * What if we did not create objects to be use in different scenarios? * What if every piece of code would be SINGLE PURPOSE? Duplication is not the devil :::info Using this model of "development view" we don't have to reduce replication and **duplication**, so we shouldn't have to create **reusable code**, so the **coupling** of the overall solution should decrease ::: DRY vs WET (Write Everything Twice) #### Source control architecture requirement => GitHub issue => pull request ==> code :::success Duplication is not the devil, we now have tools to enable us to navigate code-level duplication using development view centralization (e.g. GitHub) ::: #### My personal summary **Reusability** and **"code centralization"** (*DRY*) were the responses we as an industry gave 20 years ago to the problem of *duplication* and *replication* of code, and to answer to the question: *"what are the places I have to change in our sw when we have to change a feature?"*. Now, with modern source control versions (like git) and tools (like GitHub), and with a bit of disciplined process when committing your code, you can let go of duplication and **do centralize not the code but the usages of the code**. This way, a change in the behavior of our sw needs not a change in the generic class used by many other modules but a look at the code changes done when implementing that behavior in the first place and change those places. ## Coupling in Distributed Systems ### What do we mean by coupling? Coupling too often is subjective and is a proxy for "I like this design / I hate this design"... More formally, ==**coupling is a measure of dependencies**==: if `X` depends on `Y` (`X --> Y`), there's coupling between them. There are two very different kind of coupling: 1. The ==**Incoming Coupling**==: the coupling from the perspective of `Y` => things that use `Y` (its `usages`, _"who depends on me..."_): **Afferent Coupling** (`Ca`) 2. The ==**Outgoing Coupling**==: the coupling from the perspective of `X` => things that `X` depends on (its `dependencies`, _"on who I depend..."_): **Efferent Coupling** (`Ce`) Let's see an example: | Which kind is worse? | Afferent (Incoming) | Efferent (Outgoing) | |:--------------------:|:-------------------:|:-------------------:| | `A` | 5 | 0 | | `B` | 0 | 5 | | `C` | 2 | 2 | | `D` | 0 | 0 | :::info :information_source: If you want to answer with _"it depends"_ (classic consultant answer), you should give a fully-formed sentence: _"...if <X\>, this option is better, if <Y\>, that option is preferable, ... "_ => you have to provide the conditions under which an option is better or preferable. ::: * Nothing depends on `D`, and `D` doesn't depend on anything else => D is so *loosely coupled* that is quite **useless** and could be removed... :satisfied: * `A` seems to be self-contained (doesn't depend upon anything) and is used (so it's useful). Changes in `A` may break its clients. - It may be the case of **primitives of the language**, or frameworks, or serializations, or **logging**, or **[Data Transfer Objects](https://en.wikipedia.org/wiki/Data_transfer_object)**... * `B` has no dependencies on it, and calls other classes. It's SAFE TO CHANGE then. - It may be a **presentation layer** or a **user interface**, closer to the top of the app: `B` is called back by some kind of UI event to which it's registered and in turns calls out some "controller" or "service layer" * `C` sounds like an integration layer: take something from somewhere and transform to some other form to other to consume. - It may be a **"service layer"**: it's called by other things (e.g. clients) and in turns calls other classes (e.g. domain entities). - **Business logic and service layers tend to have this kind of coupling.** #### Incoming vs Outgoing Coupling It's reasonable to have many incoming coupling (e.g. logging or language primitive calls have many incoming coupling), while it's almost always not good to have many outgoing coupling. When you have many **outgoing couplings** you probably are doing too much things (`SRP` violation?). :::warning The question to ask ourselves is: **Are we coupled to STABLE things or to VOLATILE things?** That matters! ::: * Incoming coupling typically is very generic and general-purpose: e.g. logging is logging, regardless of the domain you operate in. => More **STABLE** things. * Outgoing coupling a lot of times refers to domain-specific things => More **VOLATILE** things. :::info Having higher outgoing coupling is worse that having higher incoming coupling, because outgoing coupling typically means depending on things that may change more frequently (e.g. depending on logging (stable) vs depending on domain logic (more volatile)). ::: For example, depending on a logging framework is not nearly as bad as taking a dependency on a e.g. calculation function in our domain, because that may change more frequently! * If IncomingCoupling(`A`) ~= 50 OK... * If OutgoingCoupling(`B`) ~= 50 BAD! * If OutgoingCoupling(`C`) gets higher, say 4,5, ... that is BAD too because it will destabilize its incoming coupling too... - so, try to have outgoing coupling and incoming coupling as low as possible for business logic classes #### How do you count and measure the coupling? In the IncomingCoupling what counts is the number of classes that depend upon you (regardless of different calls made by the same class on you). In the OutgoingCoupling what counts is the number of different ways you depend on other classes (methods, properties, etc), even on a same class. ![](https://i.imgur.com/8s16g1I.png) What's the difference between `X` and `A` in terms of coupling? Both `X` and `A` depends on another class, but: * IncomingCoupling(`B`) == IncomingCoupling(`Y`) == 1 - if I make a change to `B`, how many classes could I break? Answer: 1 * OutgoingCoupling(`A`) = 5 >> OutgoingCoupling(`X`) = 1 - is `A` fulfilling the single responsibility principle (SRP)? - that influence how self-contained a class is - more things that `A` calls out to With ==**incoming coupling**==, the concern is like: _"if I'm making a change to this class, how many other classes I may break?"_ (for both `B` and `Y` is 1). With ==**outgoing coupling**==, the concern is like: _"is this class fulfilling SRP?"_ :::info The more outgoing calls a class makes (to different classes or to the same one, it makes no difference), the less it is self-contained. ::: (`A` seems to be less self-contained that `X`) #### Measuring coupling in codebases How do we keep our codebase in good shape and well-designed in the long term? Have some metrics in place to be able to spot places where the code is going bad... (e.g. if it's a a [DTO](https://en.wikipedia.org/wiki/Data_transfer_object) and is calling out other classes, maybe raise an alarm) For example you could have a CI build rule to fail the build when some coupling metric exceeds a threshold value. **A nice rule: FIX IT UP or TALK to another developer to SIGN-OFF on that** When a code quality rule / code design convention is violated, let the dev choose weather to fix that up or have another team pair to "review" the code that "breaks" that coupling rule that we measure and decide together if this is a case where the rule could be ignored => **Continuously communicating our expectations as a team on the design rules that we want to enforce in our code**. This allow us to: * => **keep our codebase well-designed in the long-term** and * => help the **"design rules" of the team foster and be shared between the team members** Don't INFLICT these rules on the team, **HAVE A CONVERSATION** within the team on why some type of coupling make sense on e.g. presentation layer, e.g. DTO, and why others do not. #### To recap... **Minimize afferent and efferent coupling** - but not mechanically or blindly! (_I'll use a `doIt` method instead of many smaller methods to reduce the coupling between `A` and `B`_) - use tools to help having conversations in your team - the goal is not to hide the coupling, but to tackle it from a design perspective - zero coupling is NOT really possible - when you have a zero, that zero should mean something (e.g. a UI should not be called by the domain, a DTO should not depends on other domain stuff) ### Platform, temporal and spatial coupling + hidden coupling: Loose coupling at the system level When we move from the coupling at the application level to the coupling at the system level, we discover new types of coupling... #### 1. ==Hidden Coupling== (aka _"beware of shared resources!"_) Happens via some kind of **shared resources**. ![](https://i.imgur.com/u2YkGTK.png) E.g. `A` writes something to DB and `B` consumes that data => `A` and `B` are coupled! It's an *implicit coupling*: you don't see it but it's there. **Shared resource** could be anything shared, not just **a database** (which is THE classic shared resource), but also **a REST service**. REST or any other form of abstraction cannot save you in this context! Even if a REST service exposes a different representation of a domain concept hidden behind the REST API, the underlying business concept may be the same (e.g I call it `shipping address` and you call it `delivery address`, still the same address, we need to agree on how an address is structured). This means that, when the business rules change, those changes tend to ripple out and affect all different representations! => it's still a form of coupling! 1. :bulb: Number 1 tip: **Make the coupling explicit, make it visible!** But sum amount of coupling is **inevitable**. 2. :bulb: Try to keep the shared data as narrow as possible! If two pieces of code depends on each other through some columns on a table, try to have that data **as narrow as possible**. #### 2. ==Platform Coupling== Given a logical link between two different pieces of code: how do they talk together? Are they bound to a specific platform? * If we choose a binary / serialization / RMI protocol to let them talk => we get a higher degree of coupling to the runtime platform * If we choose a HTTP-centric (aka REST / WSDL / ...) => we get a lower degree of platform coupling ![](https://i.imgur.com/CeBgigc.png) Service-orientation came out of this need to loosen platform coupling: share schema, not classes or types! #### 3. ==Temporal Coupling== Deals with the coupling on the dimension of time. Two pieces of code, one calling the other over the network. * **Synchronous call**: if `A` is not able to do anything meaningful until it gets a response from `B` => **high degree of temporal coupling** between `A` and `B` * **Asynchronous call**: if `A` is able to continue doing meaningful things while waiting on `B`'s response => **low degree of temporal coupling** between `A` and `B` ![](https://i.imgur.com/bCcmMC9.png) **Temporal coupling** and **Platform coupling** are independent from each other. E.g. binary serialization of events into a queue (temporal coupling low, platform coupling high). #### 4. ==Spatial Coupling== (~> topology fallacy) How much my code is bound to a specific machine and its IP? Load balancing or DNS could help in loosening this kind of coupling. ![](https://i.imgur.com/W86gSBK.png) ### Solutions to Platform Coupling Many options for **interoperability** (besides `HTTP` and `JSON`). - representation on the wire (text-based): how do you represent a payload on the wire - transfer protocols (standard-based): how do you move the payload from one place to another - abstractions over the text representations: SOAP, REST, WSDL Text-based representations on the wire: XML, JSON, ... Each textual representations has its own **schema** (XSD as XML schema, JSON schemas, ...) and its IDL (_interface definition language_). **Schema** is not to give advantages to interoperability and decoupling, it's to **give more productivity to developers**: an IDE can generate proxy classes given a schema to help devs to have ready-to-go classes to talk to a remote service (saving developer time). **Schema validation** also saves developer time and let them avoid the need to write their own custom validation mechanism. Schema validation too is not related to interoperability, it's just a tool for productivity. Standard-based transfer protocols: HTTP, SMTP, UDP, ... * **HTTP** is widely used and too often abused * **SMTP** and **UDP** are interoperable and provide one-to-many communication (which **HTTP** do not provide)! * **SMTP** has many reliability properties built-in and works great if you need a one-to-many communication. Always try to **use the right tool for the job**! Don't write your own one-to-many communication on top of HTTP! Just don't do it. **SOAP**, **REST** and **WSDL** operates above the transport level (e.g. HTTP) and above the text representations of the payload (XML, JSON): * SOAP vs REST: SOAP provide a definition of an envelope. * SOAP provides a standard way of handing HTTP headers. * REST: many people "soapize" REST, trying to give REST the properties of SOAP (e.g. with custom headers or MIME types). * WSDL is all about: for a given XML request structure, what XML response structure you'll gonna get back. * REST: Hypermedia as the engine of state (_HATEOS_), great idea if you want to design the web, but does not work if you want to solve a simpler business problems than designing Internet. [JNBridge](https://jnbridge.com/): inter-op between Java and .NET: * running Java code in-process on the CLR * running .Net code in-process on the JVM => the advantage is that all happens in-process, not via remote calls (first rule of distributed objects...avoid distributed calls). **When two things talk to each other, there's a ==logical coupling== between the two sides**, and that is inevitable (schema does not remove that coupling, just improve dev productivity): if I make a change on the invoked side, that could break things on the invoking side, even if I have a schema and a schema validation in place. When humans consume Internet data, they have a tool that compensate for ambiguity: our brain. When machines talks to other machines over the Internet, that ambiguity in the messages could not work (until an AI is invented). **Statically-typed languages vs dynamically-typed languages** * static-typing works better in larger teams with a greater variation of skill levels * dynamic-typing works better in smaller teams with a higher level of skills and lower variation of skills 17-Coupling_solutions_platform min: 9.00 min