learning
architecture
DDD
eventsourcing
CQRS
TODO
(https://learn.particular.net/courses/take/adsd-online-free)
An application has a single executable and runs on a single machine
A system can be made up of multiple executable elements, possibly running on multiple machines
A common model (either in OO or functional languages) is to wrap the remote elements we communicate to into a sort of proxy to make a remote call looks like a regular call (hiding the network… => abstraction leak) .
That's part of where the problems of distributed systems start: trying to abstract away the network!
var svc = new MyService();
var result = svc.Process(data);
a lot all stuff happens here: marshaling, serializing, dispatching, deserializing AND a remote call behind the scene!
How do I test it?
It works on my machine…
mmmm… what about timeouts?
solution: we throw the error in the face of the user…
solution: let's retry 3 times…
request/response model assume that for each request there will be a response or the system is not available.
what about transactionality?
Message Queues (MQ) to the rescue! Designed to solve these type of retry & ack, store & forward problems.
But we loose the request/response model, which is part of the "synchronous" approach.
"Network cannot be reliable" is a fact!
So, the question is: can we build a reliable system on top of an unreliable network?
Yes, but we have to build it differently than the traditional request / response pattern. We will see a lot less request / response across the network, instead we'll redesign a lot of boundaries of our system in order to have more QUEUE-based delivery model => this bring reliablility
The first rule of distributed object by Martin Fowler is "don't distribute your objects"
What is latency? it's the time to cross the network in one direction, from the client to the server.
It's not round-trip time, because it involves also serialize / deserialize / computation time on the server, which really has nothing to do with network latency…
Latency may be seem almost zero when running things in your own machine, but Small for a LAN, but for WAN & internet can be large!! => Many times slower than in-memory access…
Hiding the remote invocations is nice but has drawbacks!
DI hides the latency (this call will be a local call? a remote call? the different is in many orders of magnitude!)
Lots and lots of remote calls are bad!
Message Queuing is not really a "remote call", the metaphor is more like a post office where you go and drop a letter for someone vs taking a phone and calling a person. It's less coupling in the former.
ORM and lazy-loading is another form of "hidden remote call".
Solutions? Don't cross the network if you don't have to.
If you have to cross the network, take all the data you might need with you.
Bandwidth fallacy plays a balancing effect against latency fallacy
Latency is easy to measure, bandwidth is much harder to measure.
Actual standard:
Bandwidth did not evolve as much as CPU speed and DATA storage did.
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway
–Andrew Tanenbaum, 1981
Bandwidth vs Latency
Solution:
This conflicting forces of bandwidth and latency is an incentive that push us to decompose our model into multiple smaller more specific submodules (subset of scenarios) where you can eagerly fetch.
Solution:
Move time-critical / performance-critical data to separate dedicated networks.
You can have more than one network for a server (more network cards or virtual networks) => so create high priority networks with more network bandwidth allocated, and low priority networks with less network bandwidth.
Example: when you have a single service layer, deployed monolithically (it has just a single API) => no network bandwidth negotiation: the first coming is the first that get served.
=> split the APIs in two (no big rewriting for this) and map the network ports to different virtual networks with different priorities
So, this logical business decomposition (the one that bring you to split the APIs) becomes also a physical network decomposition and get benefits from it.
The amount of available bandwidth network is limited (25-40 MByte per seconds): how do we allocate this to give priority to most important business tasks?
ENTITY CENTRIC => PRIORITY CENTRIC
Doubt: Is this really convenient? With decomposition, we creates more moving parts (split APIs into many APIs potentially deployed to different machine…) => more work => more teams.
Yes: We're not making the problem disappear, we're moving it (that's happens when you deal with physics, e.g. see the leverage example)!
Yes: we could host the two APIs into the same host, without even moving them to separate networks. This is a small step that would give us options!
We do not make the problem disappear, we instead move the problem to a place where you have more leverage (solve more parts of a problem at a lower cost).
Also, doing the logical decomposition now does not require us to ALSO deploy it into separate networks (physical decomposition).
The question we have to ask ourselves is: are we talking about logical architecture or physical architecture?
Logical and physical do not have to be one to one: you could still be doing a "monolithic deployment" of your logically-separated modules into the same machine.
A better strategy would be to focus first on a logical decoupling of your system, even in advance of physically separated deployments.
And when you decide to separate deployments physically, you could still start hosting those different deployment units on the same machine before going "distributed".
So, a logical decomposition does not require also a physical decomposition (that one could come later), but this would give us the option for a physical decomposition later (or for buying more network )
So, essentially subdividing things logically give us options, while design things monolithically does not.
We start by creating options, and then when the complexity, we have the ability to use them!
When you move from an "RPC-based model" to a Messaging-queue model, you basically decompose your "monolithic" API into separate concrete messages (e.g. GenerateReportCommand
, CheckFraudCommand
, …) and each command could then be routed into a different queue and each queue could be hosted into a separate network.
Every machine connected to a network is not secure :-)
What do you do when you cannot longer access something that you expect was there (but it's no more because of a change in topology)?
Solutions?
Do a performance test EARLY in the project lifetime: at 30% of a project you should have enough of functionality and architecture in order to run a performance test!
The more complex the distributed system is, the more moving parts you have, the more it's likely there's NOT a single person that knows how all works together => document the system and invest in a configuration management.
"Startups: Inventing the car while you're trying to drive down the road"
dev: "Administrator will take care of that…"
me: NO"
The business says to us
and what they're thinking is "…any time the devs or ops person touch the system, something is going to break…".
this fear push to bigger upgrades => and this is a positive feedback loop => the more code we write, the more changes that goes into a single deploy, the higher the risk that something goes wrong, and loop…
=> continuous deployment and automation to the rescue.
Having an "high available" architecture is not enough to have high availability of the whole system, you could have a high available architecture but have downtime when you deploy a new version of your software.
Solution:
=> We have to make sure that the code we deploy is backward compatible: being able to run version n
on one server and version n+1
on another server in parallel, side by side, live!
If you can do that, you can reach high availability!
Backward compatibility but you need it in order to have a real high available system!
Make smaller changes to more loosely coupled pieces of code.
Solution:
Enable the admin to take parts of the system down for maintenance without adversely affecting the rest!
=> Queue helps here: one component keeps on queuing messages on the queue while the other component may not be consuming those messages.
Solution:
Consider how to pinpoint problems in production.
A little bit of logging can be helpful, but too much can be harmful!
When you cannot find the issue looking at a log, what do you do? Add more logs!!
Focus more on the problem of finding the cause of the problem instead of being able to generate and process more and more logs.
Serialization and deserialization takes a lot of CPU time!!
It's not happening in code that we measure (happens in libraries we don't profile usually).
Cloud is cool because at the end of the month its billings says to you "this is how stupid you were in designing your system".
A bad architecture may also work but be not affordable on the cost side => transport cost may affect you here!
Many moving parts today… but…
10 years ago: Java and DotNet ruled the world. And they basically could inter-operate.
Then: DECENTRALIZATION => ruby, nosql (aka "none of your reporting system would not work on top of it"), REST, HTTP and JSON, and yadda yadda … => the integration problems between these different techs will give us work for the next 20 years!
Semantic interoperability and the story of an hospital patient care system
NULL => the semantics of the two systems are not aligned => and fails in production (it pops out under real use).
Physical distribution vs logical distribution
We tend to think of a big ball of mud as a monolithic application deployed on a single machine. But that's not the only "monolith"!
A common "big ball of mud" is a "physically-distributed" monolith (aka "a distributed monolith"): still logically tightly coupled, but we choose to deploy it to multiple machines, with a "physically distributed deployment" => it's still a logical "monolith"!
The logical elements of the big ball of muds are those which hurts us, regardless the system being "distributed" on more machines or "centralized" into a single machine.
Here we're talking about a "logical" big ball of muds.
Maintenance is hard in "big balls of mud": a change here breaks something there.
Unit tests pass, integration tests pass, but something breaks somewhere eventually.
One of the forces that push a system towards becoming a monolith is integration though the DATABASE => depending on a commons structure in a database creates an implicit logical coupling between different pieces of code.
We go live and something somewhere depending on your database structure breaks!
More machine all hammering on the same database.
Stateless-service model => the database become the bottleneck (and you cannot scale out the database).
solutions:
…yes, but how?
where are the boundaries of the system?
the rest of the course will spend many time on these, because it's difficult.
What is the typical lifecycle of a "project"?
At a point in time someone says "yes! the system is finished!" (and this is where the consultants say "so long!" and fly aways). And the system goes in "maintenance mode".
NO! It's not actually in maintenance mode!
There are
It's not really DONE!
=> this is when the actual works STARTS!
Maintenance or original development:
A system in "maintenance-mode" requires a great deal of skills compared to the one needed in the original development phase.
Then, why the "senior" people is put in the initial phase and then the maintenance is left to junior?
And we cannot do the inverse either…
=> That's because this is a FALSE dichotomy!
And the whole idea of designing a system so that "it will be maintainable by somebody else (maybe less skilled)" never worked!!
The system in never "finished".
The software is meant to always evolve.
The "construction+maintenance" metaphor taken from the building and manufacturing is BAD!
Construction is very different to maintenance when building a house.
While building software and maintaining software are fundamentally the same activities! (it requires the same kind of activities from the team and the stakeholders).
Projects are a poor model for software development. We need a better metaphor!
Project ===> Product
The product is a better metaphor.
Creating a product is a long term initiative (e.g. the first iphone will be so perfect that we won't need working on it after the first release, we'll just go in maintenance mode… naaah!)
The product could live almost forever, while a single feature could be handled as a project.
=> we need to design for long-term viability
Adopt a "product" mentality.
A bad experience with projects is needed to understand that we need a product mentality (because people and orgs change because of a pain).
The business does not actually give us requirements, they're usually much better at coming with WORKAROUNDS.
So we need a strong profile of IT Business Analyst, with enough "power" and political connections to push back the business and say "I understand that this could work for you, but let's talk about your needs so that we'll find a good solution for your problem…".
Rapid prototyping to see if the problem would be solved.
We don't need "stenographers" who just write down what the business is asking to then bring that hard-coded decision to the team. We need a profession here.
It's important to understand the WHYs!
Once we have the WHYS and the "build the right thing" taken care by the Business Analyst, there's another role: the IT Architect.
The IT Architect should help in estimating the "right thing to build", should help in telling which prototype could be doable.
Also, the estimate process should tell roughly HOW we're going to build the thing.
A correct estimate process:
Given a well-formed team of a certain size S (this is a group of people that has worked together in the past, not a bunch of people assembled somehow…)
that is not working on anything else (otherwise the estimate is meaningless…)
then we can start to have a conversation of the estimate:
I'm C% confident (not a promise) work will take between T1 and T2
promise != possibility (%
and ranges)
This allows the business stakeholders to make decisions.
If the estimate is too broad, let me do a POC for say a week…
at the end of the POC I'll give a new estimate with a higher confidence.
Is this enough confidence for the business? If it's not, let's iterate over the POC to come with a higher degree of confidence and a narrower range of time.
At the end of this process of communication between IT and business, you have a "requirement" and an "estimate", meaning that the business has an idea of the value that they're going to take from this and an idea of the costs they will have to pay.
A new role is coming now, the Project Manager, who will negotiate with the business which requirements in the team backlog will be re-prioritized and pushed later to have our feature scheduled for development when the business need it.
Sounds not agile?
These are just tools that helps in the communication between business and IT, for a joint and explicit decision-making process.
What about the Product Owners?
The Product Owner in SCRUM should decide "what need to be done", but often the PO is just a glorified business analyst:
NoEstimates and other "no-methodologies": it's not that important what they say NO to, but what they say YES to, what they propose as an alternative to solve the same issue solved with the tool/process they say NO to.
NoEstimates need a mature organization to work well.
Personal considerations: to me the "IT Architect" role should be played by the whole team, not by an external person. The team should participate to the process of evaluating the prototypes and estimates their costs. I know that this means that the team has to stop working on the actual items in their TODO list, but that is better than having a list of things to estimate thrown at the team.
aka "Reuse is good!".
What happen when the business rules we codified in our software change?
If the code of a functionality is centralized just in one place, we make the change to that place and we're done.
If the code is scattered in multiple places,
So, as an industry, we strive to create reusable patters (e.g. DRY), that was our solution.
good or bad design?
The problem is that centralizations and reuse begets coupling: the more a code is generic and reusable, the more usages will have, all depending upon it.
Arguably this generic (and more complex) reusable code will need to change and when it changes may break many things around.
Reusability => high amount of use => high amount of dependencies => more tight coupling
So we create more coupling under this idea of "what happens when the rules change?!"
It's almost impossible to design code to be so flexible for any type of rule changes.
How do resolve this issue?
Architecture requires more that one view to describe it.
One of these view is the "development view" in our source control system (like git): this view allow us to "tag" source code by feature implementation (e.g. each feature has its Trello card id and the commit message may refer to that id, or like in GitHub where you have issues that refers to changes in code).
If you can find easily which places we need to change in order to change a feature, we maybe do not need to reuse so much (because reusability is the response to the question: how can we be sure to have changed all the code in all the multiple places where is duplicated?).
The one place where to find what files need to be changed is the source control repository, so we get the benefit of reuse (go in one place and see what need to be changed) without creating coupling in our design.
So essentially we create centralization in the development view of my architecture rather than in our class diagrams.
When we have many "single purpose code" => we have some amount of duplication => solution in the past was REUSE.
Duplication is not the devil
Using this model of "development view" we don't have to reduce replication and duplication, so we shouldn't have to create reusable code, so the coupling of the overall solution should decrease
DRY vs WET (Write Everything Twice)
requirement => GitHub issue => pull request ==> code
Duplication is not the devil, we now have tools to enable us to navigate code-level duplication using development view centralization (e.g. GitHub)
Reusability and "code centralization" (DRY) were the responses we as an industry gave 20 years ago to the problem of duplication and replication of code, and to answer to the question: "what are the places I have to change in our sw when we have to change a feature?".
Now, with modern source control versions (like git) and tools (like GitHub), and with a bit of disciplined process when committing your code, you can let go of duplication and do centralize not the code but the usages of the code. This way, a change in the behavior of our sw needs not a change in the generic class used by many other modules but a look at the code changes done when implementing that behavior in the first place and change those places.
Coupling too often is subjective and is a proxy for "I like this design / I hate this design"…
More formally, coupling is a measure of dependencies: if X
depends on Y
(X --> Y
), there's coupling between them.
There are two very different kind of coupling:
Y
=> things that use Y
(its usages
, "who depends on me…"): Afferent Coupling (Ca
)X
=> things that X
depends on (its dependencies
, "on who I depend…"): Efferent Coupling (Ce
)Let's see an example:
Which kind is worse? | Afferent (Incoming) | Efferent (Outgoing) |
---|---|---|
A |
5 | 0 |
B |
0 | 5 |
C |
2 | 2 |
D |
0 | 0 |
If you want to answer with "it depends" (classic consultant answer), you should give a fully-formed sentence: "…if <X>, this option is better, if <Y>, that option is preferable, … " => you have to provide the conditions under which an option is better or preferable.
D
, and D
doesn't depend on anything else => D is so loosely coupled that is quite useless and could be removed… A
seems to be self-contained (doesn't depend upon anything) and is used (so it's useful). Changes in A
may break its clients.
B
has no dependencies on it, and calls other classes. It's SAFE TO CHANGE then.
B
is called back by some kind of UI event to which it's registered and in turns calls out some "controller" or "service layer"C
sounds like an integration layer: take something from somewhere and transform to some other form to other to consume.
It's reasonable to have many incoming coupling (e.g. logging or language primitive calls have many incoming coupling), while it's almost always not good to have many outgoing coupling.
When you have many outgoing couplings you probably are doing too much things (SRP
violation?).
The question to ask ourselves is:
Are we coupled to STABLE things or to VOLATILE things?
That matters!
Having higher outgoing coupling is worse that having higher incoming coupling, because outgoing coupling typically means depending on things that may change more frequently (e.g. depending on logging (stable) vs depending on domain logic (more volatile)).
For example, depending on a logging framework is not nearly as bad as taking a dependency on a e.g. calculation function in our domain, because that may change more frequently!
A
) ~= 50 OK…B
) ~= 50 BAD!C
) gets higher, say 4,5, … that is BAD too because it will destabilize its incoming coupling too…
In the IncomingCoupling what counts is the number of classes that depend upon you (regardless of different calls made by the same class on you).
In the OutgoingCoupling what counts is the number of different ways you depend on other classes (methods, properties, etc), even on a same class.
What's the difference between X
and A
in terms of coupling?
Both X
and A
depends on another class, but:
B
) == IncomingCoupling(Y
) == 1
B
, how many classes could I break? Answer: 1A
) = 5 >> OutgoingCoupling(X
) = 1
A
fulfilling the single responsibility principle (SRP)?A
calls out toWith incoming coupling, the concern is like: "if I'm making a change to this class, how many other classes I may break?" (for both B
and Y
is 1).
With outgoing coupling, the concern is like: "is this class fulfilling SRP?"
The more outgoing calls a class makes (to different classes or to the same one, it makes no difference), the less it is self-contained.
(A
seems to be less self-contained that X
)
How do we keep our codebase in good shape and well-designed in the long term?
Have some metrics in place to be able to spot places where the code is going bad… (e.g. if it's a a DTO and is calling out other classes, maybe raise an alarm)
For example you could have a CI build rule to fail the build when some coupling metric exceeds a threshold value.
A nice rule: FIX IT UP or TALK to another developer to SIGN-OFF on that
When a code quality rule / code design convention is violated, let the dev choose weather to fix that up or have another team pair to "review" the code that "breaks" that coupling rule that we measure and decide together if this is a case where the rule could be ignored => Continuously communicating our expectations as a team on the design rules that we want to enforce in our code.
This allow us to:
Don't INFLICT these rules on the team, HAVE A CONVERSATION within the team on why some type of coupling make sense on e.g. presentation layer, e.g. DTO, and why others do not.
Minimize afferent and efferent coupling
doIt
method instead of many smaller methods to reduce the coupling between A
and B
)When we move from the coupling at the application level to the coupling at the system level, we discover new types of coupling…
Happens via some kind of shared resources.
E.g. A
writes something to DB and B
consumes that data => A
and B
are coupled!
It's an implicit coupling: you don't see it but it's there.
Shared resource could be anything shared, not just a database (which is THE classic shared resource), but also a REST service.
REST or any other form of abstraction cannot save you in this context!
Even if a REST service exposes a different representation of a domain concept hidden behind the REST API, the underlying business concept may be the same (e.g I call it shipping address
and you call it delivery address
, still the same address, we need to agree on how an address is structured). This means that, when the business rules change, those changes tend to ripple out and affect all different representations! => it's still a form of coupling!
But sum amount of coupling is inevitable.
If two pieces of code depends on each other through some columns on a table, try to have that data as narrow as possible.
Given a logical link between two different pieces of code: how do they talk together? Are they bound to a specific platform?
Service-orientation came out of this need to loosen platform coupling: share schema, not classes or types!
Deals with the coupling on the dimension of time.
Two pieces of code, one calling the other over the network.
A
is not able to do anything meaningful until it gets a response from B
=> high degree of temporal coupling between A
and B
A
is able to continue doing meaningful things while waiting on B
's response => low degree of temporal coupling between A
and B
Temporal coupling and Platform coupling are independent from each other.
E.g. binary serialization of events into a queue (temporal coupling low, platform coupling high).
How much my code is bound to a specific machine and its IP?
Load balancing or DNS could help in loosening this kind of coupling.
Many options for interoperability (besides HTTP
and JSON
).
Text-based representations on the wire: XML, JSON, …
Each textual representations has its own schema (XSD as XML schema, JSON schemas, …) and its IDL (interface definition language).
Schema is not to give advantages to interoperability and decoupling, it's to give more productivity to developers: an IDE can generate proxy classes given a schema to help devs to have ready-to-go classes to talk to a remote service (saving developer time).
Schema validation also saves developer time and let them avoid the need to write their own custom validation mechanism. Schema validation too is not related to interoperability, it's just a tool for productivity.
Standard-based transfer protocols: HTTP, SMTP, UDP, …
Always try to use the right tool for the job! Don't write your own one-to-many communication on top of HTTP! Just don't do it.
SOAP, REST and WSDL operates above the transport level (e.g. HTTP) and above the text representations of the payload (XML, JSON):
JNBridge: inter-op between Java and .NET:
=> the advantage is that all happens in-process, not via remote calls (first rule of distributed objects…avoid distributed calls).
When two things talk to each other, there's a logical coupling between the two sides, and that is inevitable (schema does not remove that coupling, just improve dev productivity): if I make a change on the invoked side, that could break things on the invoking side, even if I have a schema and a schema validation in place.
When humans consume Internet data, they have a tool that compensate for ambiguity: our brain. When machines talks to other machines over the Internet, that ambiguity in the messages could not work (until an AI is invented).
Statically-typed languages vs dynamically-typed languages
17-Coupling_solutions_platform min: 9.00 min