tags: `learning` `architecture` `DDD` `eventsourcing` `CQRS`

TODO

Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
follow a Docker kata? => https://github.com/praqma-training/docker-katas

Advanced Distributed System Design

(https://learn.particular.net/courses/take/adsd-online-free)

Systems are not applications

An application has a single executable and runs on a single machine
- e.g MS Word, MS Powerpoint, storing data on your local drive
- Usually has a single source of information
- Applications don’t know about “connectivity”
A system can be made up of multiple executable elements, possibly running on multiple machines
- most of our systems are distributed systems
- e.g. a simple web-app is made at least by 3 distributed elements:
  - a browser (one machine)
  - a server process (another machine)
  - a database server (yet another machine)
- Usually has multiple sources of information
- All systems must deal with “connectivity”
  - what should I do when it does not respond?
  - what should I do when it does take too much time to respond?
- Each executable within a system is not an application
- Each executable must deal with “connectivity"

A common model (either in OO or functional languages) is to wrap the remote elements we communicate to into a sort of proxy to make a remote call looks like a regular call (hiding the network… => abstraction leak) .
That's part of where the problems of distributed systems start: trying to abstract away the network!

"The 8 fallacies of distributed computing"

1. The network is reliable

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →


var svc = new MyService();
var result = svc.Process(data);

a lot all stuff happens here: marshaling, serializing, dispatching, deserializing AND a remote call behind the scene!
How do I test it?

It works on my machine…

mmmm…

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

what about timeouts?

did the call arrive to the server?
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
is the server taking too long to respond?
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

solution: we throw the error in the face of the user…

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

solution: let's retry 3 times…

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Synchronous vs Asynchronous

asynch => there's no user waiting, I can store & forward
synch => user is waiting for a result (request/response model)

request/response model assume that for each request there will be a response or the system is not available.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

what about transactionality?

Message Queues (MQ) to the rescue! Designed to solve these type of retry & ack, store & forward problems.
But we loose the request/response model, which is part of the "synchronous" approach.

"Network cannot be reliable" is a fact!

So, the question is: can we build a reliable system on top of an unreliable network?
Yes,

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

but we have to build it differently than the traditional request / response pattern. We will see a lot less request / response across the network, instead we'll redesign a lot of boundaries of our system in order to have more QUEUE-based delivery model => this bring reliablility

The first rule of distributed object by Martin Fowler is "don't distribute your objects"

2. Latency isn’t a problem

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

What is latency? it's the time to cross the network in one direction, from the client to the server.
It's not round-trip time, because it involves also serialize / deserialize / computation time on the server, which really has nothing to do with network latency…

Latency may be seem almost zero when running things in your own machine, but Small for a LAN, but for WAN & internet can be large!! => Many times slower than in-memory access…

Hiding the remote invocations is nice but has drawbacks!
DI hides the latency (this call will be a local call? a remote call? the different is in many orders of magnitude!)

Lots and lots of remote calls are bad!

Message Queuing is not really a "remote call", the metaphor is more like a post office where you go and drop a letter for someone vs taking a phone and calling a person. It's less coupling in the former.

ORM and lazy-loading is another form of "hidden remote call".

Solutions? Don't cross the network if you don't have to.
If you have to cross the network, take all the data you might need with you.

Bandwidth fallacy plays a balancing effect against latency fallacy

3. Bandwidth isn’t a problem

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Latency is easy to measure, bandwidth is much harder to measure.

Actual standard:

Gigabit Ethernet = 1000 mbps => 128 MByte per second.
TCP then eat about half of this due to its protocol overhead. => 68 MB
Then it comes the serialization (JSON, XML, protocolbuf) => eat another half! => 32 MB
Then you have more server competing on the same bandwidth. => more TCP packet collisions => more retransmission => more latency (#2) => more timeouts! => the network is unreliable… (#1).

Bandwidth did not evolve as much as CPU speed and DATA storage did.

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway
–Andrew Tanenbaum, 1981

Bandwidth vs Latency

to preserve bandwidth => load the minimum amour of data possible (lazy load)
to reduce latency => prefetch all what you can (eagerly load)

Solution:
This conflicting forces of bandwidth and latency is an incentive that push us to decompose our model into multiple smaller more specific submodules (subset of scenarios) where you can eagerly fetch.

Solution:
Move time-critical / performance-critical data to separate dedicated networks.
You can have more than one network for a server (more network cards or virtual networks) => so create high priority networks with more network bandwidth allocated, and low priority networks with less network bandwidth.

Example: when you have a single service layer, deployed monolithically (it has just a single API) => no network bandwidth negotiation: the first coming is the first that get served.
=> split the APIs in two (no big rewriting for this) and map the network ports to different virtual networks with different priorities

So, this logical business decomposition (the one that bring you to split the APIs) becomes also a physical network decomposition and get benefits from it.
The amount of available bandwidth network is limited (25-40 MByte per seconds): how do we allocate this to give priority to most important business tasks?

ENTITY CENTRIC => PRIORITY CENTRIC

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Doubt: Is this really convenient? With decomposition, we creates more moving parts (split APIs into many APIs potentially deployed to different machine…) => more work => more teams.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Yes: We're not making the problem disappear, we're moving it (that's happens when you deal with physics, e.g. see the leverage example)!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Yes: we could host the two APIs into the same host, without even moving them to separate networks. This is a small step that would give us options!

We do not make the problem disappear, we instead move the problem to a place where you have more leverage (solve more parts of a problem at a lower cost).

Logical vs physical architecture

Also, doing the logical decomposition now does not require us to ALSO deploy it into separate networks (physical decomposition).

The question we have to ask ourselves is: are we talking about logical architecture or physical architecture?

Logical and physical do not have to be one to one: you could still be doing a "monolithic deployment" of your logically-separated modules into the same machine.

A better strategy would be to focus first on a logical decoupling of your system, even in advance of physically separated deployments.
And when you decide to separate deployments physically, you could still start hosting those different deployment units on the same machine before going "distributed".

So, a logical decomposition does not require also a physical decomposition (that one could come later), but this would give us the option for a physical decomposition later (or for buying more network

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

So, essentially subdividing things logically give us options, while design things monolithically does not.

We start by creating options, and then when the complexity, we have the ability to use them!

When you move from an "RPC-based model" to a Messaging-queue model, you basically decompose your "monolithic" API into separate concrete messages (e.g. GenerateReportCommand, CheckFraudCommand, …) and each command could then be routed into a different queue and each queue could be hosted into a separate network.

With an RPC-based model the single unit of routing is the single API => you need a higher lever decomposition
With a Queue-based model the single unit of routing is the single message.

4. The network is secure

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Every machine connected to a network is not secure :-)

5. The topology won’t change

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

What do you do when you cannot longer access something that you expect was there (but it's no more because of a change in topology)?

=> timeout exceptions!
=> a sync and blocking invocation could hang up a thread in our thread pool (for 30 seconds by default!) => waste of resources!!

Solutions?

don't hard-code IP addresses :D (if you can afford it)
discovery mechanisms (apps that configure themselves at startup) - fragile with a complex system.

Do a performance test EARLY in the project lifetime: at 30% of a project you should have enough of functionality and architecture in order to run a performance test!

6. The administrator will know what to do

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

The more complex the distributed system is, the more moving parts you have, the more it's likely there's NOT a single person that knows how all works together => document the system and invest in a configuration management.

"Startups: Inventing the car while you're trying to drive down the road"

High availability: why upgrade==downtime for the business?

dev: "Administrator will take care of that…"
me: NO"

The business says to us

"no, no, don't touch anything, Black Friday is coming"
"no, no, don't touch anything, the CEO is gonna speak to a conference"
"no, no, don't touch anything, we've just published a marketing campaign"

and what they're thinking is "…any time the devs or ops person touch the system, something is going to break…".

this fear push to bigger upgrades => and this is a positive feedback loop => the more code we write, the more changes that goes into a single deploy, the higher the risk that something goes wrong, and loop…
=> continuous deployment and automation to the rescue.

Having an "high available" architecture is not enough to have high availability of the whole system, you could have a high available architecture but have downtime when you deploy a new version of your software.

Solution:
=> We have to make sure that the code we deploy is backward compatible: being able to run version n on one server and version n+1 on another server in parallel, side by side, live!
If you can do that, you can reach high availability!

Backward compatibility but you need it in order to have a real high available system!

Make smaller changes to more loosely coupled pieces of code.

Solution:
Enable the admin to take parts of the system down for maintenance without adversely affecting the rest!
=> Queue helps here: one component keeps on queuing messages on the queue while the other component may not be consuming those messages.

Solution:
Consider how to pinpoint problems in production.
A little bit of logging can be helpful, but too much can be harmful!
When you cannot find the issue looking at a log, what do you do? Add more logs!!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Focus more on the problem of finding the cause of the problem instead of being able to generate and process more and more logs.

7. Transport cost isn’t a problem

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Serialization and deserialization takes a lot of CPU time!!
It's not happening in code that we measure (happens in libraries we don't profile usually).

Cloud is cool because at the end of the month its billings says to you "this is how stupid you were in designing your system".

A bad architecture may also work but be not affordable on the cost side => transport cost may affect you here!

8. The network is homogeneous

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Many moving parts today… but…

10 years ago: Java and DotNet ruled the world. And they basically could inter-operate.

Then: DECENTRALIZATION => ruby, nosql (aka "none of your reporting system would not work on top of it"), REST, HTTP and JSON, and yadda yadda … => the integration problems between these different techs will give us work for the next 20 years!

Semantic interoperability and the story of an hospital patient care system
NULL => the semantics of the two systems are not aligned => and fails in production (it pops out under real use).

summary

Fred Brooks book -> "Mythical man-month"
In a distributed system we attempt to model the necessary (intrinsic) complexity in the domain, and to make it explicit, to minimize the accidental complexity.
In a monolith system there could be no explicit model of the complexity, so it's more difficult to know when I'm crossing the boundary between necessary and accidental complexity => lot more internal complexity => hard to understand => because there isn't a model, a picture.

Plus 3 more fallacies of enterprise computing:

9. The system is atomic/monolithic

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Physical distribution vs logical distribution
We tend to think of a big ball of mud as a monolithic application deployed on a single machine. But that's not the only "monolith"!

A common "big ball of mud" is a "physically-distributed" monolith (aka "a distributed monolith"): still logically tightly coupled, but we choose to deploy it to multiple machines, with a "physically distributed deployment" => it's still a logical "monolith"!

The logical elements of the big ball of muds are those which hurts us, regardless the system being "distributed" on more machines or "centralized" into a single machine.
Here we're talking about a "logical" big ball of muds.

Maintenance is hard in "big balls of mud": a change here breaks something there.
Unit tests pass, integration tests pass, but something breaks somewhere eventually.

Integration through the DB creates coupling

One of the forces that push a system towards becoming a monolith is integration though the DATABASE => depending on a commons structure in a database creates an implicit logical coupling between different pieces of code.

We go live and something somewhere depending on your database structure breaks!

If the system wasn't designed to scale out to multiple machines, doing so may actually hurt performance

More machine all hammering on the same database.
Stateless-service model => the database become the bottleneck (and you cannot scale out the database).

solutions:

internal loose coupling
modularize

…yes, but how?
where are the boundaries of the system?
the rest of the course will spend many time on these, because it's difficult.

10. The system is finished

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

(from Project to Product)

What is the typical lifecycle of a "project"?

At a point in time someone says "yes! the system is finished!" (and this is where the consultants say "so long!" and fly aways). And the system goes in "maintenance mode".

NO! It's not actually in maintenance mode!
There are

still bugs
still architecture issues
still new features that the business wants
still capabilities problems

It's not really DONE!
=> this is when the actual works STARTS!

Maintenance or original development:

where is more difficult and skill-requiring to fix bug?
where is more difficult to add new features?
where is more difficult to deploy new versions?

A system in "maintenance-mode" requires a great deal of skills compared to the one needed in the original development phase.
Then, why the "senior" people is put in the initial phase and then the maintenance is left to junior?
And we cannot do the inverse either…
=> That's because this is a FALSE dichotomy!

And the whole idea of designing a system so that "it will be maintainable by somebody else (maybe less skilled)" never worked!!

The system in never "finished".
The software is meant to always evolve.

The "construction+maintenance" metaphor taken from the building and manufacturing is BAD!
Construction is very different to maintenance when building a house.
While building software and maintaining software are fundamentally the same activities! (it requires the same kind of activities from the team and the stakeholders).

Projects are a poor model for software development. We need a better metaphor!

Project ===> Product

The product is a better metaphor.

Creating a product is a long term initiative (e.g. the first iphone will be so perfect that we won't need working on it after the first release, we'll just go in maintenance mode… naaah!)
The product could live almost forever, while a single feature could be handled as a project.

=> we need to design for long-term viability

Adopt a "product" mentality.

we need to have a team that is able to work together in the long-term so that the knowledge is retained by the team (even if some member can change)
no quick and dirty

A bad experience with projects is needed to understand that we need a product mentality (because people and orgs change because of a pain).

A better development process?

The business does not actually give us requirements, they're usually much better at coming with WORKAROUNDS.

So we need a strong profile of IT Business Analyst, with enough "power" and political connections to push back the business and say "I understand that this could work for you, but let's talk about your needs so that we'll find a good solution for your problem…".

=> focus on what is the actual problem?
understanding what problem the business is trying to solve

Rapid prototyping to see if the problem would be solved.

We don't need "stenographers" who just write down what the business is asking to then bring that hard-coded decision to the team. We need a profession here.

It's important to understand the WHYs!

Once we have the WHYS and the "build the right thing" taken care by the Business Analyst, there's another role: the IT Architect.
The IT Architect should help in estimating the "right thing to build", should help in telling which prototype could be doable.
Also, the estimate process should tell roughly HOW we're going to build the thing.

A correct estimate process:

Given a well-formed team of a certain size S (this is a group of people that has worked together in the past, not a bunch of people assembled somehow…)
that is not working on anything else (otherwise the estimate is meaningless…)
then we can start to have a conversation of the estimate:
I'm C% confident (not a promise) work will take between T1 and T2

promise != possibility (% and ranges)

This allows the business stakeholders to make decisions.

If the estimate is too broad, let me do a POC for say a week…
at the end of the POC I'll give a new estimate with a higher confidence.

Is this enough confidence for the business? If it's not, let's iterate over the POC to come with a higher degree of confidence and a narrower range of time.

At the end of this process of communication between IT and business, you have a "requirement" and an "estimate", meaning that the business has an idea of the value that they're going to take from this and an idea of the costs they will have to pay.

A new role is coming now, the Project Manager, who will negotiate with the business which requirements in the team backlog will be re-prioritized and pushed later to have our feature scheduled for development when the business need it.

Sounds not agile?
These are just tools that helps in the communication between business and IT, for a joint and explicit decision-making process.

What about the Product Owners?
The Product Owner in SCRUM should decide "what need to be done", but often the PO is just a glorified business analyst:

without the necessary backbone to stand-up to the business part of the organization and
without the "rigor" of a process that goes from an idea of the business to a schedule of what should be built

NoEstimates and other "no-methodologies": it's not that important what they say NO to, but what they say YES to, what they propose as an alternative to solve the same issue solved with the tool/process they say NO to.

NoEstimates need a mature organization to work well.

Personal considerations: to me the "IT Architect" role should be played by the whole team, not by an external person. The team should participate to the process of evaluating the prototypes and estimates their costs. I know that this means that the team has to stop working on the actual items in their TODO list, but that is better than having a list of things to estimate thrown at the team.

11. Business logic can and should be centralized

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

aka "Reuse is good!".

What happen when the business rules we codified in our software change?
If the code of a functionality is centralized just in one place, we make the change to that place and we're done.
If the code is scattered in multiple places,

we have to retype that functionality in all the places
…and we might forget one!

So, as an industry, we strive to create reusable patters (e.g. DRY), that was our solution.

Entities, with all the core business logic
Application Services, function layer, built on top of Entities
Use Cases, that operate on top of those functions to implement a full-flow of a service.

good or bad design?

too many lines!
too many dependencies

The problem is that centralizations and reuse begets coupling: the more a code is generic and reusable, the more usages will have, all depending upon it.
Arguably this generic (and more complex) reusable code will need to change and when it changes may break many things around.

Reusability => high amount of use => high amount of dependencies => more tight coupling

So we create more coupling under this idea of "what happens when the rules change?!"

It's almost impossible to design code to be so flexible for any type of rule changes.

How do resolve this issue?

some logic will be physical distributed, installed on multiple machines, e.g. frontend client, backend server, database.
but we can still use a "centralized" development view of our sw

Architecture requires more that one view to describe it.
One of these view is the "development view" in our source control system (like git): this view allow us to "tag" source code by feature implementation (e.g. each feature has its Trello card id and the commit message may refer to that id, or like in GitHub where you have issues that refers to changes in code).

If you can find easily which places we need to change in order to change a feature, we maybe do not need to reuse so much (because reusability is the response to the question: how can we be sure to have changed all the code in all the multiple places where is duplicated?).

The one place where to find what files need to be changed is the source control repository, so we get the benefit of reuse (go in one place and see what need to be changed) without creating coupling in our design.

So essentially we create centralization in the development view of my architecture rather than in our class diagrams.

When we have many "single purpose code" => we have some amount of duplication => solution in the past was REUSE.

What if we did not create those REUSABLE code?
What if we did not create objects to be use in different scenarios?
What if every piece of code would be SINGLE PURPOSE?

Duplication is not the devil

Using this model of "development view" we don't have to reduce replication and duplication, so we shouldn't have to create reusable code, so the coupling of the overall solution should decrease

DRY vs WET (Write Everything Twice)

Source control architecture

requirement => GitHub issue => pull request ==> code

Duplication is not the devil, we now have tools to enable us to navigate code-level duplication using development view centralization (e.g. GitHub)

My personal summary

Reusability and "code centralization" (DRY) were the responses we as an industry gave 20 years ago to the problem of duplication and replication of code, and to answer to the question: "what are the places I have to change in our sw when we have to change a feature?".

Now, with modern source control versions (like git) and tools (like GitHub), and with a bit of disciplined process when committing your code, you can let go of duplication and do centralize not the code but the usages of the code. This way, a change in the behavior of our sw needs not a change in the generic class used by many other modules but a look at the code changes done when implementing that behavior in the first place and change those places.

Coupling in Distributed Systems

What do we mean by coupling?

Coupling too often is subjective and is a proxy for "I like this design / I hate this design"…

More formally, coupling is a measure of dependencies: if X depends on Y (X --> Y), there's coupling between them.

There are two very different kind of coupling:

The Incoming Coupling: the coupling from the perspective of Y => things that use Y (its usages, "who depends on me…"): Afferent Coupling (Ca)
The Outgoing Coupling: the coupling from the perspective of X => things that X depends on (its dependencies, "on who I depend…"): Efferent Coupling (Ce)

Let's see an example:

Which kind is worse?	Afferent (Incoming)	Efferent (Outgoing)
`A`	5	0
`B`	0	5
`C`	2	2
`D`	0	0

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

If you want to answer with "it depends" (classic consultant answer), you should give a fully-formed sentence: "…if <X>, this option is better, if <Y>, that option is preferable, … " => you have to provide the conditions under which an option is better or preferable.

Nothing depends on D, and D doesn't depend on anything else => D is so loosely coupled that is quite useless and could be removed…
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
A seems to be self-contained (doesn't depend upon anything) and is used (so it's useful). Changes in A may break its clients.
- It may be the case of primitives of the language, or frameworks, or serializations, or logging, or Data Transfer Objects…
B has no dependencies on it, and calls other classes. It's SAFE TO CHANGE then.
- It may be a presentation layer or a user interface, closer to the top of the app: B is called back by some kind of UI event to which it's registered and in turns calls out some "controller" or "service layer"
C sounds like an integration layer: take something from somewhere and transform to some other form to other to consume.
- It may be a "service layer": it's called by other things (e.g. clients) and in turns calls other classes (e.g. domain entities).
- Business logic and service layers tend to have this kind of coupling.

Incoming vs Outgoing Coupling

It's reasonable to have many incoming coupling (e.g. logging or language primitive calls have many incoming coupling), while it's almost always not good to have many outgoing coupling.

When you have many outgoing couplings you probably are doing too much things (SRP violation?).

The question to ask ourselves is:
Are we coupled to STABLE things or to VOLATILE things?
That matters!

Incoming coupling typically is very generic and general-purpose: e.g. logging is logging, regardless of the domain you operate in. => More STABLE things.
Outgoing coupling a lot of times refers to domain-specific things => More VOLATILE things.

Having higher outgoing coupling is worse that having higher incoming coupling, because outgoing coupling typically means depending on things that may change more frequently (e.g. depending on logging (stable) vs depending on domain logic (more volatile)).

For example, depending on a logging framework is not nearly as bad as taking a dependency on a e.g. calculation function in our domain, because that may change more frequently!

If IncomingCoupling(A) ~= 50 OK…
If OutgoingCoupling(B) ~= 50 BAD!
If OutgoingCoupling(C) gets higher, say 4,5, … that is BAD too because it will destabilize its incoming coupling too…
- so, try to have outgoing coupling and incoming coupling as low as possible for business logic classes

How do you count and measure the coupling?

In the IncomingCoupling what counts is the number of classes that depend upon you (regardless of different calls made by the same class on you).

In the OutgoingCoupling what counts is the number of different ways you depend on other classes (methods, properties, etc), even on a same class.

What's the difference between X and A in terms of coupling?
Both X and A depends on another class, but:

IncomingCoupling(B) == IncomingCoupling(Y) == 1
- if I make a change to B, how many classes could I break? Answer: 1
OutgoingCoupling(A) = 5 >> OutgoingCoupling(X) = 1
- is A fulfilling the single responsibility principle (SRP)?
- that influence how self-contained a class is
- more things that A calls out to

With incoming coupling, the concern is like: "if I'm making a change to this class, how many other classes I may break?" (for both B and Y is 1).

With outgoing coupling, the concern is like: "is this class fulfilling SRP?"

The more outgoing calls a class makes (to different classes or to the same one, it makes no difference), the less it is self-contained.

(A seems to be less self-contained that X)

Measuring coupling in codebases

How do we keep our codebase in good shape and well-designed in the long term?
Have some metrics in place to be able to spot places where the code is going bad… (e.g. if it's a a DTO and is calling out other classes, maybe raise an alarm)

For example you could have a CI build rule to fail the build when some coupling metric exceeds a threshold value.

A nice rule: FIX IT UP or TALK to another developer to SIGN-OFF on that
When a code quality rule / code design convention is violated, let the dev choose weather to fix that up or have another team pair to "review" the code that "breaks" that coupling rule that we measure and decide together if this is a case where the rule could be ignored => Continuously communicating our expectations as a team on the design rules that we want to enforce in our code.

This allow us to:

=> keep our codebase well-designed in the long-term and
=> help the "design rules" of the team foster and be shared between the team members

Don't INFLICT these rules on the team, HAVE A CONVERSATION within the team on why some type of coupling make sense on e.g. presentation layer, e.g. DTO, and why others do not.

To recap…

Minimize afferent and efferent coupling

but not mechanically or blindly! (I'll use a doIt method instead of many smaller methods to reduce the coupling between A and B)
use tools to help having conversations in your team
the goal is not to hide the coupling, but to tackle it from a design perspective
zero coupling is NOT really possible
when you have a zero, that zero should mean something (e.g. a UI should not be called by the domain, a DTO should not depends on other domain stuff)

Platform, temporal and spatial coupling + hidden coupling: Loose coupling at the system level

When we move from the coupling at the application level to the coupling at the system level, we discover new types of coupling…

1. Hidden Coupling (aka "beware of shared resources!")

Happens via some kind of shared resources.

E.g. A writes something to DB and B consumes that data => A and B are coupled!

It's an implicit coupling: you don't see it but it's there.
Shared resource could be anything shared, not just a database (which is THE classic shared resource), but also a REST service.

REST or any other form of abstraction cannot save you in this context!
Even if a REST service exposes a different representation of a domain concept hidden behind the REST API, the underlying business concept may be the same (e.g I call it shipping address and you call it delivery address, still the same address, we need to agree on how an address is structured). This means that, when the business rules change, those changes tend to ripple out and affect all different representations! => it's still a form of coupling!

Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Number 1 tip: Make the coupling explicit, make it visible!

But sum amount of coupling is inevitable.

Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Try to keep the shared data as narrow as possible!

If two pieces of code depends on each other through some columns on a table, try to have that data as narrow as possible.

2. Platform Coupling

Given a logical link between two different pieces of code: how do they talk together? Are they bound to a specific platform?

If we choose a binary / serialization / RMI protocol to let them talk => we get a higher degree of coupling to the runtime platform
If we choose a HTTP-centric (aka REST / WSDL / …) => we get a lower degree of platform coupling

Service-orientation came out of this need to loosen platform coupling: share schema, not classes or types!

3. Temporal Coupling

Deals with the coupling on the dimension of time.

Two pieces of code, one calling the other over the network.

Synchronous call: if A is not able to do anything meaningful until it gets a response from B => high degree of temporal coupling between A and B
Asynchronous call: if A is able to continue doing meaningful things while waiting on B's response => low degree of temporal coupling between A and B

Temporal coupling and Platform coupling are independent from each other.
E.g. binary serialization of events into a queue (temporal coupling low, platform coupling high).

4. Spatial Coupling (~> topology fallacy)

How much my code is bound to a specific machine and its IP?

Load balancing or DNS could help in loosening this kind of coupling.

Solutions to Platform Coupling

Many options for interoperability (besides HTTP and JSON).

representation on the wire (text-based): how do you represent a payload on the wire
transfer protocols (standard-based): how do you move the payload from one place to another
abstractions over the text representations: SOAP, REST, WSDL

Text-based representations on the wire: XML, JSON, …
Each textual representations has its own schema (XSD as XML schema, JSON schemas, …) and its IDL (interface definition language).

Schema is not to give advantages to interoperability and decoupling, it's to give more productivity to developers: an IDE can generate proxy classes given a schema to help devs to have ready-to-go classes to talk to a remote service (saving developer time).

Schema validation also saves developer time and let them avoid the need to write their own custom validation mechanism. Schema validation too is not related to interoperability, it's just a tool for productivity.

Standard-based transfer protocols: HTTP, SMTP, UDP, …

HTTP is widely used and too often abused
SMTP and UDP are interoperable and provide one-to-many communication (which HTTP do not provide)!
SMTP has many reliability properties built-in and works great if you need a one-to-many communication.

Always try to use the right tool for the job! Don't write your own one-to-many communication on top of HTTP! Just don't do it.

SOAP, REST and WSDL operates above the transport level (e.g. HTTP) and above the text representations of the payload (XML, JSON):

SOAP vs REST: SOAP provide a definition of an envelope.
SOAP provides a standard way of handing HTTP headers.
REST: many people "soapize" REST, trying to give REST the properties of SOAP (e.g. with custom headers or MIME types).
WSDL is all about: for a given XML request structure, what XML response structure you'll gonna get back.
REST: Hypermedia as the engine of state (HATEOS), great idea if you want to design the web, but does not work if you want to solve a simpler business problems than designing Internet.

JNBridge: inter-op between Java and .NET:

running Java code in-process on the CLR
running .Net code in-process on the JVM

=> the advantage is that all happens in-process, not via remote calls (first rule of distributed objects…avoid distributed calls).

When two things talk to each other, there's a logical coupling between the two sides, and that is inevitable (schema does not remove that coupling, just improve dev productivity): if I make a change on the invoked side, that could break things on the invoking side, even if I have a schema and a schema validation in place.

When humans consume Internet data, they have a tool that compensate for ambiguity: our brain. When machines talks to other machines over the Internet, that ambiguity in the messages could not work (until an AI is invented).

Statically-typed languages vs dynamically-typed languages

static-typing works better in larger teams with a greater variation of skill levels
dynamic-typing works better in smaller teams with a higher level of skills and lower variation of skills

17-Coupling_solutions_platform min: 9.00 min

tags: learning architecture DDD eventsourcing CQRS

Advanced Distributed System Design

Systems are not applications

"The 8 fallacies of distributed computing"

1. The network is reliable Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

Synchronous vs Asynchronous

2. Latency isn’t a problem Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

3. Bandwidth isn’t a problem Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

Logical vs physical architecture

4. The network is secure Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

5. The topology won’t change Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

6. The administrator will know what to do Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

High availability: why upgrade==downtime for the business?

7. Transport cost isn’t a problem Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

8. The network is homogeneous Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

summary

Plus 3 more fallacies of enterprise computing:

9. The system is atomic/monolithic Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

Integration through the DB creates coupling

If the system wasn't designed to scale out to multiple machines, doing so may actually hurt performance

10. The system is finished Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → (from Project to Product)

A better development process?

11. Business logic can and should be centralized Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

Source control architecture

My personal summary

Coupling in Distributed Systems

What do we mean by coupling?

Incoming vs Outgoing Coupling

How do you count and measure the coupling?

Measuring coupling in codebases

To recap…

Platform, temporal and spatial coupling + hidden coupling: Loose coupling at the system level

1. Hidden Coupling (aka "beware of shared resources!")

2. Platform Coupling

3. Temporal Coupling

4. Spatial Coupling (~> topology fallacy)

Solutions to Platform Coupling

Read more

Cool TV-Series to watch

About Me

Chi deve pubblicare gli eventi (changes) di un aggregato?

A package dependency analyser done with JetBrains' AI assistant in IntelliJ

tags: `learning` `architecture` `DDD` `eventsourcing` `CQRS`

1. The network is reliable

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

2. Latency isn’t a problem

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

3. Bandwidth isn’t a problem

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

4. The network is secure

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

5. The topology won’t change

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

6. The administrator will know what to do

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

7. Transport cost isn’t a problem

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

8. The network is homogeneous

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

9. The system is atomic/monolithic

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

10. The system is finished

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

(from Project to Product)

11. Business logic can and should be centralized

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →