Chapter 5: Building on solid foundations

# Chapter 5: Building on solid foundations ## The house of the digital business Let's quickly review what we've seen so far. In part one we've made a broad overview of industrial revolutions and of the dynamics that shape and drive innovation. I've made some bold claims about how Industry 5.0 can increase a company speed to value and thus make it more competitive on the market and boost its growth. In the previous chapter we've then seen some typical issues that arise as manufacturers grow. We dove into some of the nitty gritty details that tend to create friction in the company, dispersing energy and resources, but being difficult to address, they are taken as a necessary evil. What we'll do in this and the next chapter is to pull apart the elements that make up every company, from a perspective that will be helpful in understanding how to increase your speed to value. We'll picture the company as a building, the house where you move and do you work. This house must have a solid structure made of foundations, load-bearing pillars and roof, to stay up and withstand earthquakes, and so that you can be safe and comfortable in it. These are the elements that we'll now go and analyze in more details, as indicated in the picture below. :::warning PICTURE OF THE HOUSE OF THE DIGITAL BUSINESS ::: - The foundation: Data - The pillars: People, Processes, Technology - The roof: Change We'll start from the foundation: data. ## Making sense of the world Most managers place themselves in a continuum where on one extreme they think it's all a matter of managing things (manufacturing equipment, facilities, inventory, money, IT infrastructure), and on the other extreme they think it's all a matter of people (relationships, roles and hierarchy, training, mentoring, etc.). One thing many of them miss however, is that for much of its work the company needs to manage DATA. Data is what the company as a whole uses to make sense of the world. Customer data, sales orders, drawings, technical specifications of products, bills of material, work orders, invoices, picking lists, KPIs...you name it. Can you imagine a business running without these crucial pieces of information? Businesses need data, they can't function without it, and it's always been that way. It's not a matter of being in the digital era. What we're really experiencing today is only a greater awareness of its importance, arising from the fact that data has started to being stored and moved and shown in many more ways and at incredibly higher speeds than what we were used to before the advent of the internet. Data is is also playing a comparatively more important role, especially in manufacturing, than in the past, but still, it's not a new thing. To understand this better, think about how data has an almost foundational role in our very existence*: it has always been at the basis of any communication, knowledge and even our senses and perceptions[^1]. [^1]:Physicists are even starting to consider how data and information use energy and may have a mass. See for example [this](https://physicsworld.com/a/information-converted-to-energy/) and [this](https://aip.scitation.org/doi/10.1063/1.5123794) So when talking about data, know it has nothing to do with some new technological fad. We're only learning to deal with it more effectively, but it's always been there, whether we realized it or not. As we live on data through our senses, companies also live on it, and the way they manage data shapes how they understand and interact with their internal and external world. From this point of view, you can see that data management it's not a trivial matter and can make a company go from surviving to thriving or viceversa. ## From data to decisions But what is data? The term originates as the plural form of the latin word _datum_, which means "given". In this sense we can define data as **discrete, raw facts the we are given about the state of the world[^2].** [^2]:There's no precise consensus about the definition of data in the academic world. For an overview, see ... In an of itself, any single piece of data has little to no value*. We can know the height of mount Everest (8849m), and that it is the highest mountain in the world, but what good does it do to us? It starts becoming interesting when we can relate it to other pieces of data and thus give it meaning. It's more interesting to know that if I'm flying at 10000 meters then I won't risk running into the Everest. When data gains meaning and value we call it _information._ These are two of the four elements of the Data-Information-Knowledge-Wisdom (DIKW) pyramid, a framework which in one form or another has been around at least since 1987, when Milan Zeleny mapped each of them to different purposes[^3]: [^3]: Some trace the concept back to a 1934 poem by T. S. Eliot, "Choruses": _«Where is the wisdom we have lost in knowledge?_ _Where is the knowledge we have lost in information?»_ | Level | Purpose | | -------- | -------- | | Data | Know-nothing | | Information | Know-what | | Knowledge | Know-how | | Wisdom | Know-why | In this sense, _knowledge_ is referred to the awareness of the rules by which things happen and people behave. It can be said of physics laws and formulas, which help us design effective products and production processes, or of principles from psychology and micro-economics, which support us in marketing campaigns and sales strategies. There's one more level: _wisdom_. In this context, wisdom is what comes after knowledge has been applied repeatedly and includes nuances and fine details linked with the complexities of real life. One thing is to have the recipe for pizza (information, know-what), another thing is being able to make one (knowledge, know-how), yet another is to be able to prepare and bake a good pizza regardless of the humidity level and the type of oven you have at your disposal, using different types of flour to change texture, and choosing toppings in a way that they exalt each other (wisdom, know-why). But how does all this apply to your job? Again, data and information is the _lifeblood of your work_, the input material used in your mental processes, while knowledge and wisdom are the infrastructure that processes them. Based on the data and information at hand in any given time, you and your team are able to apply your knowledge and wisdom to make _decisions_ and act on them. This is the final purpose, the reason of existence of everything about data and all the above: **supporting decision making.** The fastest you can make accurate decisions and put them into action, the highest will be your speed to value and thus your competitiveness. I'll make a very simplified example to visualize what I mean. > A customer calls and asks: «If I order 10 pieces of product X, can you deliver them by date Y? If you can, I'll buy right away, otherwise I'll have to pass. I need to know by tomorrow.». What's happening here? It's useful to go back to basics. Let's use the methods they teach in elementary school and write down the _data:_ - 10 pieces of X - Deadline by Y In most companies, you would pass the question and the data to the planning office to figure out whether you can accept or not, to get an _information_. They have first to gather additional information and data: - Do we have enough spare capacity? This is an *information* which requires *data* about workload, total capacity and product lead times - Do we have enough material? Another *information*, which needs *data* about the BoM, current inventory levels, and bookings from future orders Planning finds out that the only way to fit the order is to delay another one from customer B (*information*). At that point, you use your _knowledge_ of the other customer to think how they will react to a potential delay; you put together all the above plus additional pieces, e.g. the potential profit from the new order (data), if you already reached your monthly budget (information); then use your judgement (another word for wisdom) to balance the pros and cons of not satisfying customer A vs disappointing customer B and to finally to make a decision. Makes sense? What we've done here is taking a very powerful microscope to explore the micro-structure of our daily work, and you're finding out it's made by atoms of data, which form into molecules of information and are processed by enzymes of knowledge and cells of wisdom, with the output being the decisions you make and eventually your speed to value. ## Data models We've seen what data is and how it fits in your work, now we'll see what it looks like. When you have to physically build complex structures, architects typically create in advance a miniature representation to visualize in real life what it will be like. Similarly, in the fashion industry, collections are presented to the public on the catwalk, worn by people that represent how clothes could look like on the final customer. In both cases, such _representations_ are called _models_. In this sense, a model "simulates" reality by simplifying it, in a way that helps us understand it well enough so that we can make decisions about it, without having to deal with the actual thing. This is useful because the real thing may have high costs or may take too much time to build, or may be otherwise impractical to realize. We want models so that we can simplify our thinking and decision making before we jump on the real journey of realizing our vision. In the context of data, **models are formal, predefined sets of data** that we use to make sense of a specific object or situation. It may sound very abstract, but models are easier to grasp than what it may seem and an example will make it clearer. ### Restaurant data models When you go to the restaurant, the waiters takes your order and passes it on to the kitchen. For each dish, the order must contain: - the table you're seated on; - the type of dish that must be prepared; - the quantity. Miss any of these and the order cannot be delivered properly. This is a **data model**: a set of _attributes_ or _properties_ that _represent_ a specific object, in this case a kitchen order. Models have thus a minimum number of attributes that are necessary to properly handle them, plus additional ones that may be present in some cases and not in others. For example, the kitchen order may include a note to have the steak extra rare or well cooked. ### Data models everywhere You see, we always implicitly use data models in our lives, every day. Every time you fill a form, that's a data model that has been designed for a specific purpose. Our usernames and passwords are the most basic data model for our identity in the apps we use. The model itself doesn't include much information about us in this case, but it's a representation that is good enough for its purposes, that is to load your app data instead of somebody else's and avoid others accessing yours. In a manufacturing business, the most immediately recognizable data model is the sales order. In the past they used to be signed paper documents. They contained information about the acquirer, the delivery address, the items or services purchased, the price, quantity, payment terms and so on. They're like a kitchen order, but more sophisticated. Besides the sales order, really any kind of form that people have to fill in, whether on paper or on a digital device, can be considered a data model. Customer, BoM, Product, Employee, Work Order, Shift, Invoice, Equipment, Inventory, Payment, Picking List, Delivery Note, Credit Note, Job role, Customer Feedback forms, all these and more can be thought as such. ### Data models and software If the data model is fundamental for us humans to make sense of something, it is even more so for the software we use to manage data. Whatever a piece of software does, it does it through some sort of data model representing the objects it needs to deal with. It's the lens through which we "translate" the real world in software terms and through which the software gives us information that makes sense to us. The same object can even have multiple models associated with it. Sticking with the restaurant analogy, a table may be just a number for those in the kitchen, but for waiters and in the reservation software, associated to that number there's also a specific position in the restaurant and a number of seats available. In manufacturing businesses, a customer order for the sales department may include unit prices and discounts, but this data may not be present in the model used by production, which on the other hand may include additional items necessary for manufacturing which are not present in the sales model; different again could be the model used by the logistics department operators, which are concerned with packaging, carrier specifications and other data which may not used by either sales or production. ### Data models and Industry 5.0 You can then imagine how important data models are for any digital system and, by extension, any Industry 5.0 initiative. Being explicit about them has a huge advantage not just for those responsible for the management of the information system, but for the whole company: it creates a **shared language and a common foundation across all functions**. This is why data is placed as the foundation of the House. If everybody is aligned on what means what, how to describe things, and where to find the specific data we want, then it is much easier to both communicate and work together, dramatically increasing the speed to value. ## Data flows We've see what data is, how it fits in your work and what it looks like, but in order to be used, data must be _available_, when and where needed, so it needs to _move around._ To understand this better we will make a parallel with drinking water. This will serve two purposes: the first, is help you intuitively grasp some important concepts that may appear very technical and complicated as is, but that are foundational to being an active part in an Industry 5.0 initiative; the second is to remind ourselves that data is as vital for our businesses as water is for us. ### From source to flow The water we drink comes essentially from three different kinds of sources: - It comes out of the earth from specific spots called springs, which can be natural or man-made. - It rains down diffusely from the clouds (the real ones, not the IT kind). - It melts from glaciers and mountain snow. After melting or getting to the surface from above and below, the water will start _flowing._ Where will it flow? Usually, water will be pulled by gravity and will flow _down_ toward a lower altitude place (remember the path of least resistance we talked about in chapter 3). When a certain amount of water flows in the same direction, it is called a _stream._ Water flows from up—a spring, the mountains or the clouds—to down —a lake or sea. You can't reverse the flow of a river right? The fact that water goes just one way is so established that the direction or flow in streams is safely used as a reference to orient oneself. In common language this concept is also used to define what comes before and what comes after in unidirectional movement with the terms _upstream_ and _downstream._ We will use the same terms to define the *flow*, or movement, of data. It is generated upstream and flows downstream to where it will be eventually used. Now that we've seen how water moves, let's think how it is consumed. ### From flow to use You get drinking water in one of two ways: either it flows from a faucet or a pipe attached to the water grid, or it has been packaged and transported where you are. Packaged water will _end_ at some point, meaning that it will be consumed fully and you will need to get another package to have more. There's also a chance that packaged water will also eventually expire and become unusable. On the other hand, tap or pipe water is available _on demand_, that is whenever you want and for how long you want. For practical purposes, we can say that water will keep flowing forever, it will never end and will always be fresh. The way that water goes from its source to its final package or to the faucet is also interesting. Once again, there are two ways this usually happens. The freshest, purest water is caught and packaged or drunk right away as it flows out of the spring or very close to it. If not caught there, water will usually be gathered in some kind of larger body (a lake or reservoir), which can be natural or man-made. To be safe for drinking, it goes through several treatment phases to be cleaned and purified from pollutants, and then it is usually _pumped_ into the water grid. From there, it is either packaged as "less fresh" bottled water, or consumed out of the faucet, flowing out from pipes thanks to the pressure generated by the grid pumps. All this requires an _infrastructure_: the reservoirs, the pipes, the pumps, the faucets, the packaging facilities and the logistics for packaged water. ### From water to data logistics If you can make sense of this water system, it will not be too difficult understanding the high-level architecture of data systems, since data is very much the same as water: 1) You gather it from a _source_. 2) It _flows_ only in one direction toward its destination (more on this later), wether it be direct consumption or intermediate storage, and does so through an infrastructure. 3) You _consume_ it. The sources of data can be different. It can come diffusely from the environment like rain, when you have many people generating data spontaneously, like a clicks on a website, sales on a ecommerce site, sales orders from agents on the field, or production data from your shopfloor operators. It can come from a specific place like springs, in the case of sensors and machine data. It can come from longer-term deposits, such as third party or historical databases, like water melting from snow and glaciers. From the source, data flows through what is very appropriately called a _data infrastructure_, the plumbing that moves data from the source to where it will be consumed. It's made, obviously, of cables, switches, servers etc. of the company network or of the cloud provider. But this infrastructure includes also, less obviously, of a series of pieces of software that store that data in an intermediate form, like paper slips, computer documents, databases, data warehouses, data lakes, which are all forms of data _stores_, like lakes and reservoirs for water; and also software to _process_ the data in transit to clean, replicate, backup, aggregate and transform this data passing it from one place in a data store to another, like water treatement plants and packaging facilities. The final step is when you consume the data, which can be distributed in two ways. One are static reports like presentations, pdfs and spreadsheets, which, much like packaged water, at some point will end or expire. The other, like faucets and pipes, are interactive pieces of software that present data on demand, such as business intelligence suites, dashboards, interactive reports and so on. Thinking of data like drinking water and comparing the two infrastructures helps a lot in making sense of the technology required to manage data effectively and will also help you in telling a more compelling story for their business case. It's not a coincidence that many technical terms like _streaming,_ data _lake_, data _lakehouse,_ etc. refer to this analogy. ### One-way flows One more thing about _unidirectional_ flows. Some of you may think that data doesn't just go one way, since you may have two systems exchanging data _both ways._ If you think about it from the point of view of the elements of your infrastructure, that would be correct and also useful if your goal is maintaining the infrastructure itself. But our goal here is getting data where it needs to be, when needed, and in the best form possible. From this point of view, I want you to realize that each single piece of data does go _just one way_, and so the flow itself. If something goes "back", it will be a different piece of data and thus a different flow. Thinking of data flows as atomic and unidirectional channels helps tremendously in simplifying the design and maintenance of your data _architecture,_ that is, the sum of all data flows, their data models, how they relate to the infrastructure and how they interact between themselves. Even if it's not your role to take care of data architecture personally, looking at it in this way will help you understand and support the people that do have that responsibility, thus contributing to the success of your company's initiatives. ## Data quality Much has been written and discussed about data quality, and for a good reason. A lax approach to data quality can cost billions to companies[^4]. In some cases, it can cost lives. [^4]: Davenport \[1997\], \[Redman 1998, Laudon 1986\], D. P. Ballou, G.K.Tayi: “Enhancing Data Quality in DW Environments” COMMUNICATIONSOF THE ACMJanuary 1999/ Vol. 42, No. 1 pp.73-78, W. Jung: “A Review of Research: An Investigation of the Impact of Data Quality on Decision Performance” School of Information Science, Claremont Graduate University Claremont, CA 91711 I'm not exaggerating. In 1986, a rocket fuel tank exploded shortly after the launch of the Challenger Space Shuttle, causing the death of all seven crew members, due to a series of data quality issues linked to a very small component in the system. In 1988, a combination of factors linked to the data management of the Aegis missile system led the USS Vincennes to mistake an Iranian passenger aircraft for an enemy fighter jet, an error which cost the lives of 290 civilians. Even if you don't build spacecrafts or missile systems, your company is still at risk of considerable damage due to poor data quality management practices. But what problem can there be in the data? Let's make a few examples. ### Dimensions of data quality In the kitchen order example used before, if you missed one of the three necessary attributes (table, dish, quantity) you would have issues. This is a case of _missing_ data, or of _data completeness_. Another potential issue is _wrong_ data, which may mean plain wrong, like saying something is 2 meters long, when actually it is 3. This is called _data accuracy._ Or data may be accurate, but may be recorded _in the wrong way_, like indicating a country with its full name (Italy), instead of its ISO 3166 code with two characters (IT). You'll see in a minute why this can be a problem. Other dimensions of data quality include *completeness*, e.g. having revenue forecasts every month from _all_ regions; *timeliness*, as in being "fresh" enough; *consistency*, as in being the same when replicated in different systems. Consistency is especially critical when departments use different tools to consume data. Has it ever happened to you that you and another person relied on different information about the same thing? That's a data consistency problem and comes from not being deliberate enough about your data. ### Checking on data The act of checking the correctness of data is called _validation_, which is an important step to ensure data is _fit for use_. Paper can't check what people write, so automatic validation can happen only when data passes through software systems which base the validation process on various elements. 1) They allow you to define which pieces of data are mandatory and warn you if they're not provided, while letting you input optional data only when desired- 2) They define the _type_ of data for each attribute: is this piece of data a number? A date? A string of alphanumeric characters? 3) They can use specific control rules such as: - measurement units must be among a predefined list and cannot be invented; - emails must contain the @ symbol and an appropriate domain name; - dates must be expressed in a specific format (e.g. dd/mm/yyyy); - ...and so on. For this reason, a good data model will not just list the attributes of the object it represents, but also express specific requirements for each of them. To give you an idea of the importance of this, let's go back to how the country name is stored in the system. Imagine you had to include a specific certificate for every shipment going to Italy, and your software doesn't track the destination country as a separate field with a predefined list of values, but only within a long text line called "Full Address"[^5]. You may have records that use "IT", "ITA", "Italy", "Italia" in multiple combinations of lowercase and uppercase letters with a variety of creative typos, or may not have the country at all. It wold be a painful and possibly undoable job making sure the software fetches all the relevant objects automatically. Most likely whoever needs to figure out where to include the certificate will likely have to sift through _all_ the records and check them one by one, risking to miss some of them or making other mistakes. Only because the system wasn't designed for data quality. Here is another benefit of being mindful and explicit about data models used within your company: **they prevent a great deal of mistakes and simplify processing**. [^5]: No worries, most ERPs do have specific fields for each address part, and a ready list of ISO country codes to select from. Yours likely does too. But checking doesn't hurt. ### A delicate balancing act Lastly, quality considerations should take into account the fact that models should represent reality in a way that is **both sufficiently detailed and cost-effective**. The level of detail and the cost-effectiveness of a data model are inversely proportional. The more details you include the more time it takes to both gather the data and transfer it into a system, so it's a trade-off that must be balanced for each model depending on its use. In general, you should have clear the expected benefit of gathering specific pieces of data and then compare it with the costs of doing it. Will the time and cost it takes to add that info be worth it? One good example of this trade off for manufacturers is quality control documents. Recording quality checks and issues can be a real pain in the neck for operators. A quality manager may want to know as much as possible, and may have prepared very detailed forms to fill in each time, but this means operators will have to take a lot of time figuring out the details through measurements and other analyses, which eats into their available capacity and will sometimes lead to them skipping some parts or even not recording issues at all, because there are more urgent matters to deal with. On the other hand, there must be _enough_ data for the model to be _actionable_, meaning that it creates value and supports decisions and processes in a practical way. ### Perfection pending Finally, a word of warning. Data is indeed at the foundation of a strong digital business, but it is neither a panacea nor a crystall ball. While validation helps a lot, automated tools will not and cannot prevent all possible data quality issues. 100% data quality is not an achievable target. Most of the times it's a matter of being aware that these issues exist and acting in parallel to put the appropriate amount of effort in improving the quality on one side, and on the other to use data with a grain of salt: you must _always_ take into account that whatever data you use, it will have a margin of error. Depending on the situation, that margin of error can be very little or quite wide, and you will learn to distinguish each case and act accordingly (it's not dark magic, we'll see how this is possible later in the book).[^6] [^6] This is even more important to understand with modern AI tools, especially generative AI. The thicker the layer of software you use, or in other words, the longer the distance from data source to use, the deeper data quality issues may reside, making it harder to spot them. Generative AI does *a lot* of behind-the-curtains work, in fact so much that most users will never know where the data comes from and how it got to them. These are very powerful tools, and may do your company well, but require a greater deal of attention if you want to avoid serious data quality issues arising when and where you least expect them. Now, this was an introductory overview of data quality and it's not meant to be comprehensive in any way: what I hope you carry home from this is that the quality of your data models, and by extension the flows and infrastructure related to them, define both how well you see your company and how smoothly data can flow where it needs to, impacting both effectiveness and efficiency of your operation. So remember: the way you choose to represent your world through data is not simply a matter of philosophy or preference, it has a **material impact on your business**. ## Deadly Data Sins Data is a technical topic and not all of us like technical stuff. So, if you made it all the way here, congratulations! Reading this chapter was quite a feat. To close it in a memorable way, I'll summarize the most common mistakes people make about data, comparing them to the famous seven capital vices. I'm sure many of them will be familiar to you, even if you haven't fallen into them personally. ### Sloth It means lazyness. It's manifest in many different ways: not gathering important data, not caring about data quality, not trying to understand the meaning of data, not taking action when data tells you something important, not protecting data, etc. ### Gluttony It means excessive eating. It's manifest in gathering data just for the sake of it, wasting resources. This usually implies an intuition that data can be important, but there's no explicit link to company strategy and goals, no explicit benchmark or even a well defined metric for goals. This can easily lead to a waste of resources (time and money), and in the worst case, alienate the people responsible for gathering the data without a clear outcome for their activity. ### Greed It's manifest in separate information systems, not sharing data because the system do not speak to each other or, worse, simply because people don't want to. This leads to inefficiencies linked to the re-collecting or re-analyzing of data already available elsewhere in the company, and trying to reconcile different perspectives, with a high risk of generating inconsistencies. ### Pride It's manifest in showing off data without substance. This can be linked with gluttony: people can brag about gathering every single piece of data coming out of their machines, but can hardly make sense of it. A more sophisticated version of pride is the showing of colored and beautifully designed graphs, that in fact yield no real insight nor help to make decisions. ### Lust It's manifest in the excessive focus with one or more metrics and in basing decision and strategies on numbers alone, without understanding the underlying causes of the data or the impact that trying to reach a given target may have on the broader business and on the market. Plus, falling in love with data is dangerous in general because it makes us forget that it is still a _representation_ of reality, and not reality in itself. Data has limits in representing accurately the world. It focuses on specific attributes and we should be aware of what data is _not_ telling us, as much as what it shows. ### Wrath It's manifest in taking decisions too rashly based on small amounts of data without sufficient significance understanding or without understanding trends and seasonalities, or judging the performance of a project or activity too soon without giving it time to bring results. A typical example are process improvement projects, where productivity may lower at first due to the training hours and the adaptation to new procedures, but will improve over time to more than the previous level. ### Envy It's manifest in judging data by confronting it with benchmarks tand references hat are not relevant or reasonable with the actual business context. ### The eighth and most subtle of all data sins I said seven? Sorry, there's an additional one in the data world, it's the most insidious and potentially dangerous of all: forgetting that **any and all data is, from beginning to end, still a matter of people**. Why you say? Let's see in the following chapter.