# refonte OpenTezos / Blockchain 101
---
id: stucture-of-a-blockchain
disable_pagination: true
title: Structure of a blockchain
author: Nomadic Labs
---
import NotificationBar from '../../../src/components/docs/NotificationBar';
At its heart, a blockchain is a piece of technology that enables a group of users to collectively share data, and has the following properties:
- any user in the group has access to all of the data
- any user can make changes to the data, as long they follow a set of rules
- no small subset of users should be able to control the system
This set of properties, associated with the right set of rules, can be leveraged for many use cases, including the secure storing and trading of cryptocurrencies and other digital assets.
In this document, we present the basics of an architecture that enables these properties. We start by presenting the data structure of the blockchain, then present how users can work together to share this data and reach agreements on what changes are applied.
## Collectively storing read-only data, forever
Imagine you are a member of a community of authors of digital content. You would like any member to be able to store their new content so that it stays available for any member forever, or at least as long as this community exists in some form.
If you were in charge of solving this problem, how would you do it? Take a couple of minutes to think about it.
### Traditional approach
A common solution is to use a cloud storage service, hosted by some large multi-national company.
What could be the issues with this approach?
This company may:
- disappear or simply decide to stop offering this service
- have techincal issues that make your data unavailable
- decide that your content is against its new terms of service and delete it
- prevent some members or entire countries from accessing the service
- etc.
Picking a company for your community's needs means having to *trust* that this company will continue offering the service to all your members without restrictions, for a very long time.
This is what we call a *centralized approach*: a single entity is in charge of everything, and users need to trust that it won't unilaterally stop offering the service as you expect. Unfortunately, you can never fully trust a single entity.
### Decentralized approach
Avoiding the issues that come with centralization means using *decentralization*: splitting the responsibility of the service between many entities, so that no small group of entities may prevent any user from using the service.
How would you design a simple decentralized architecture to store the community's data and make it available?
First, let's assume that many members of the community each have a single computer that they dedicate to this service. The community will have to agree on what software to run on these computers. We will call each running instance of this software a *node*.
For a very basic architecture, we could have every node:
- store all of the data from the community
- be able to communicate with every other node (know each node's IP address).
To **upload data**, for example a file, members would simply send that file and its name to every other node, that would then store it.
To **download data**, a member could send a request for the file having a given name, to any of the nodes, and that node would send it to them. If for some reason the node is not available or rejects the request, they can simply try with another node.
To **join the community**, a new user would just need to know the address of one node, request the list of all the other nodes, the list of all files, and start downloading their content.
As long as one node is still running the software, and a member is connected to the internet, they will be able to download the content.
### Potential issues
This simple approach presented above has a few issues. Can you think of some?
Here are some of the main issues:
- two members could use the same name for a file
- as the community grows, the bandwith required to send files to every node becomes prohibitive
- when nodes are temporarily unavailable, they will miss some files and become desynchronized
- bad members could:
- send the wrong data, for a given download request
- saturate the node by sending too much data or too many requests
Can you think of ways to improve our architecture to prevent these issues?
### Ensuring that nodes store/send the right data
There is a risk that two members upload **different files with the same name**.
If they start uploading these files to nodes in different orders, some nodes will store the first member's file, while others will store the other one. Fixing this situation after the fact is complicated, so we need to prevent it.
Sending files to the nodes always in the same order would not work when somes nodes are temporarily disconnected or have very slow connections.
We need to ensure members **always use different names for different files**.
One way to generate a **unique name** is to produce a random sequence of characters. If the sequence is long enough, two members will never generate the same name, unless one does it on purpose.
However, we also need to make sure **the data sent by a node is the original file**, exactly as intended by its author. We want to be able to detect any node's attempt to send invalid data.
Crypgography provides a solution: **Identify the file using the *hash* of its content**, instead of a name.
A hash has the property that in practice, two different files will always have a different hashes, and after downloading a file, you can compute the hash of its content, then verify that it matches the one you requested.
### Cryptographic hashes
Blockchains make heavy use of cryptography to ensure a high level of security. One cryptographic primitive often used in a blockchain, including as a key component of the chain structure itself, is *cryptographic hashing*.
<!--
Hash functions are commonly used in blockchains, to identify blocks, operations and addresses, to prevent spamming and as we will see later to link blocks together thus forming the so called blockchain.-->
A **hash** is a sequence of bytes that is the output of a **hash function**, where the input is an arbitrarily large piece of data.
There are many types of hash functions with different properties, but most share these three main ones:
- the size of the hash is limited, typically from a few bytes to a couple of hundred bytes
- changing a single bit of the input significantly changes the output, which makes it look random
- given the same input (and sometimes for a given value of an internal seed), hash functions always produce the same hash (determinism)
In blockchains, we use **cryptographic hash functions**. Such hash functions have these additional properties:
- collision resistance: it is unfeasible in practice, to find two different inputs that yield the same hash,
- pre-image resistance: it is also unfeasible to find an input (pre-image) that ouptputs a given hash, with no better strategy than trying every possible input and computing its hash.
With these properties, the hash of some data can be seen as a unique fingerprint that identifies this data.
Tezos uses _BLAKE2b_, a cryptographic hash function that takes any sequence of bytes as input and produces 32 bytes (256 bits) hashes. Another well known example is _SHA256_.
As an example, fig. 1 shows the hash of the small string "Cake", expressed using 64 hexadecimal digits. Fig. 2 shows the hash of a longer string. Note that the two hashes are completely different, but both are 64 digits long.

<small className="figure">FIGURE 1: The hash of a small string of characters</small>

<small className="figure">FIGURE 2: The hash of a large string</small>
<NotificationBar>
<p>
You have probably already used hashes without knowing it. Indeed, when you download a file on your computer, some browsers check the hash of the downloaded file and compare it to the hash announced by the source before download. If the two hashes match, it means that the file you have downloaded is a perfect match to the one intended to be sent by the source. If the hashes don't match, your download has been corrupted. You can try this out manually by downloading the latest release of _Ubuntu_, computing the hash of the downloaded file, and comparing it to the one announced on [their website](https://ubuntu.com/tutorials/how-to-verify-ubuntu#5-verify-the-sha256-checksum).
</p>
</NotificationBar>
### Reducing bandwidth needs
With a centralized approach, a member only has to upload new content once to the central entity, therefore minimizing bandwidth needs.
With our basic decentralized approach, as the community grows and the number of nodes increases, it becomes prohibitive for the author of new content to send their data to every single node in the network.
Limiting the number of nodes would limit the bandwidth for uploading, but increase the amount of bandwidth required for nodes to reply to download requests.
**How can we solve this dilemma?**
Assuming we still want every node to store all of the data, the total bandwidth used by the community can't be reduced. What we can reduce however, is the bandwidth needed for a single user or node.
This means:
- **when uploading**, a member should only send its data to a subset of the nodes.
- **when downloading**, members shouldn't all send their requests to the same node.
If a user only sends their data to a subset of the nodes, this means these nodes must in turn send their data to the remaining nodes.
For this, we organize the nodes as a **peer-to-peer network**, where each node knows a subset of other nodes, called its **neighbors**. When a node receives a file it doesn't already have, it asks each of its neighbors if they already have that file (using the hash of its content as an identifier), and sends it to the ones that don't.
As long as the network is strongly connected, i.e. not split in two parts where no node of one part knows a node of the other part, then the file will quickly propagate through the entire network. Every node will then store a copy of the file. To ensure the network is connected, we make sure each node knows a large enough subset of the other nodes that is picked at random.
The peer-to-peer network can be called *p2p* for short. In the context of blockchains, we also call it the **gossip network**.
When downloading data, different members can simply connect to different nodes, to **spread the load**. If a node is missing the requested file, for example because it was disconnected during its propagation, it can use the p2p network to obtain it from another node, by sending the same request to one of its neighbors.
Check [this chapter](TODO) to learn more about the p2p network.
### Avoiding DOS attacks
Our basic p2p network is vulnerable to **Denial Of Service** attacks (DOS for short), where a malicious member sends so much data to the network, or so many requests, that it saturates the network and makes it unavailable for other members.
One common approach is to limit the amount of data or requests that a given member, or a given IP address, can perform, then reject any requests beyond these limits.
However, if we keep the community open, an attacker can easily use a large number of machines with different IP addresses, or impersonate a large number of members, to go around such limits. Furthermore, strict limits may prevent legitimate intensive uses of the network.
Can you think of another solution?
One approach that works well consists in **making attacks costly** for anyone trying to attack the network.
There are mutiple ways to make it costly:
- require users to **perform a significant amount of computing for any request**. This implies spending computing resources and electricity. This approach called *Proof of Work* (PoW for short), has the unfortunate side effect that it wastes energy and is therefore bad for the environment. Its use should therefore be limited. We talk about PoW in more detail later (TODO Link).
- require users to **pay for every operation**. The payment could be transferred to the nodes that do the work, which means this approach would double as an incentive for members to contribute their own nodes. This implies handling transfers of amounts of currency, which is one of the reasons why cryptocurrency is usually involved with decentralized architecture.
## Decentralized currency
### Accounts and transfers
For our community to be able to manage its own currency, we need to support at least a few basic features.
We need each member to have:
- A unique account identifier, or **address**.
- Some amount of currency it possesses, the **balance** of the account
To use it, a member needs, to be able to, at the minimum:
- Transfer some currency from their account to another member's account.
- Receive currency from another member.
On Tezos, the native currency is called **Tez**.
How would you implement this for our community?
### Centralized approach
We could use a centralized approach by appointing one of the members of the community as a banker.
The banker would keep a **ledger** of all transactions it processes. They would also keep track of the current balance of each account.
Whenever a member requests a transfer:
- the banker checks their identity
- they check that the balance is sufficient
- they reduce the balance of the source, and increase the one from the destination
- they charge for this service
However, such a banker, may, similarly to a hosting company:
- be unavailable, or go bankrupt
- block some transfers
- create fake transfers, or other illegal use of the funds
Again, using such a centralized approach means users have to trust a single entity, that has full control over the funds. This can be very dangerous.
### Decentralized approach
Think about how our community could maintain the balances of accounts and handle transfers, without having to trust a single entity.
We could build on top of our p2p data sharing network, and have each node contribute equally to supporting this currency.
A node would have to be able to:
- **Receive** and **emit** transactions
- **Keep track** of the **balance** of each account
- **Check the validity** of these transactions
- Authenticate their author
- Check that the balance is sufficient
How can we do this?
**Receiving and emitting transactions** could be handled the same way as receiving and emitting files in our p2p network.
When a nodes receives a transaction:
- It performs some verifications (to avoid DOS attacks)
- It propagates it to its neighbors
**Maintaining the balance of each address** could be done by each node based on the transactions it receives. For each member's **address** (unique identifier), each node can store a corresponding **balance**.
Once a transaction is validated and confirmed, the node will need to:
- **Subtract** the transferred amount from the balance of the source address
- **Add** this amount to the balance of the destination address
### Authentication
To validate a transaction, the node first needs to **authenticate its source**.
For example, let's take the following transaction:
Alice transfers 50 tez to Bob
Before the node substracts 50 tez from Alice's account and adds 50 tez to Bob account, it has to check that Alice is the author of this transaction.
This can be done using **asymmetric cryptography**. Each member generates and stores:
- A **private key**, that they are the only one to possess, and allows them to produce a digital signature for a certain piece of data:
“I, Alice, certify that for my transaction #284, I transfer 50 tez to Bob”
- A **public key**, that they share with the whole network, and allows:
- For the user to uniquely identify themselves (their account)
- For any member, to verify that they are the author of a signed message
Tezos users use a Wallet (Kukai, Umami, Temple, Ledger, …) to generate and store their private keys and sign transactions.
In practice, the address of an account is based on the hash of the public key of the holder of that account.
### Checking that the balance is sufficient
At first, it may seem like all a node needs to do when receiving a transaction, after authenticating its author, is to check that the balance of the account of the source of the funds is more than the amount it transfers to the destination.
In a decentralized network however, it is not that simple.
Let's assume that we already found a way to make sure every node in the p2p network receives every transaction.
A key remaining issue is that the validity of transactions may depend on the order in which a given node receives them.
Assume that at the start, Alice has 100 Tez and Bob has 10 Tez. We have 2 transactions:
- Transaction A: **Alice transfers 50 Tez to Bob**
- Transaction B: **Bob transfers 30 Tez to Carl**
**If a node applies A first**, Bob’s balance becomes 60 tez. **B is valid** and can be executed.
**If a node applies B first, B is invalid:** Bob’s balance is 10 Tez, which is too low for a transfer of 30 Tez.
These two nodes end up in a different state!
As all nodes should agree on the balance of each account, they not only need to agree on which transactions need to be performed, but also on their order.
A key aspect of managing a currency on a decentralized network is therefore to agree on **which transactions to add, and in which order**.
## Deciding on transactions to include and their order
### Ordering transactions and associated issues
Assuming again that each node receives every transaction sent to a node by their author, how would you make sure every node executes them in the same order?
One natural idea would be to attach a precise timestamp to every transaction, then have nodes excute transactions in the corresponding order.
Can you think of any issues with that approach?
One issue is the case where two transactions have the same timestamp. This could however easily be resolved by sorting these transactions using their hash, which is unique.
Another issue is that to execute a given transaction, you would need to make sure you already received every transaction with a smaller or equal timestamp.
One could consider allowing for some set delay to account for network issues that could slow the propagation of some transactions. However, if nodes reject transactions that arrive too late compared to their attached timestamp, two nodes may reject different transactions and therefore end up in different incompatible states.
Another approach could be for nodes to revert to a previous state, whenever a transaction arrives after the application of transactions with a higher timestamp, then reapply the transactions in the right order to get the new updated state. Assuming that all nodes eventually receive all transactions, they would always converge to the same state.
This however means that anyone who had inquired about the state from such a node before a revert, would have obtained inaccurate information, and potentially made important decisions based on that information.
### Double spending example
Let's say Carl wants to purchase items from both Daphne and Eve, for 20 Tez each, but only has 30 Tez on his account,
Carl could send the following transactions to the network, with timestamps in this order:
- Transaction A: **Carl transfers 20 Tez to Daphne**
- Transaction B: **Carl transfers 20 Tez to Eve**
Now let's say that the following steps happen in this order on a given node:
- Transaction B is received.
- Transaction B is executed. Carl and Eve's balances are updated.
- Eve checks she received payment, and sends her item to Carl
- Transaction A is received.
- The node reverts to the initial state, resetting Carl and Eve's balances.
- Transaction A is executed. Carl and Daphne's balances are updated.
- Transaction B is executed but fails, as Carl doesn't have enough funds on his balance.
- Daphne checks she received payment, and sends her item to Carl
We can see that although Carl only had 30 Tez, he ended up receiving two 20 Tez items, with 10 Tez left on his account.
This situation is a case of what we call **double spending**: using the same tez for two different transactions. Here, this is done by taking advantage of synchronization issues.
A system where nodes eventually agree with eachother on which transactions are executed in which order and on the resulting state is not sufficient: we need a way to eventually reach **finality**: a situation where a node can guarantee that all the transactions executed up to a point are **final**, and that the community collectively agrees.
Note that here, Carl could have purposely sent transaction A with a timestamp in the past, after sending transaction B and checking that Eve already sent the item. It could also have simply been due to some congestion issue. As Carl may be in control of some nodes, there is no way to differentiate between the two.
### Using blocks
To summarize, we need a way for nodes to collectively agree on which transactions to include, and in which order, with points in time where given transactions become final.
Another way to see it is that we need a mechanism for nodes to collectively agree once and for all on what the next transaction is, and keep doing this indefinetely.
Whatever mechanism we put in place, however, will require nodes to communicate with eachother and exchange many messages over the network. Doing all this for every transaction, no matter what the exact mechanism is, would be prohibitively slow and consume a lot of bandwidth.
Assuming we do have a very good mechanism for nodes to agree on the next transaction, how could we significantly reduce the number of messages ?
The solution is to avoid applying this mechanism for every transaction, and instead, group transactions and apply it for every group of transactions. Instead of agreeing on the next transaction, nodes agree in one application of the mechanism, on the next X transactions and their order.
On a blockchain, we call such a group of transactions a **block**.
A block is mostly a sequence of transactions (and other kinds of operations), to be applied by every node, in order.
### Blockchain structure
To summarize what we have seen so far:
- A blockchain is a set of nodes that receive transactions from users, propagate them through a peer-to-peer network, and collectively select them and include them in a sequence of transactions (and other operations).
- For performance reasons, this sequence of transactions is split in groups called blocks, and some mechanism is applied for the nodes to collectively agree on what the next block is, therefore forming a chain of blocks, hence the name **blockchain**.
- Each node applies every transaction of the chain in the order of the sequence, using the same software, to maintain an internal state. Typically, this state includes the balances of all users' accounts. As all nodes apply the same transactions in the same order, they all end up with the same state.
- Transactions are discarded if they are invalid, either because they are not correctly signed by their author, or when their application is invalid, for example due to lack of funds in the balance of the source account of a transfer.
On top of the sequence of transactions (and other operations), each block contains a header with a small amount of extra information such as a timestamp, the position of the block in the chain (the level), and more, but most importantly, the hash of the content of the previous block. This hash uniquely identifies the previous block, making it a link to this block, forming a chained structure.
As part of the mechanism for nodes to agree on what the next block in the chain should be, multiple blocks may be created and propagated, that link to the same previous block. As there can be only one next block in the chain, the goal of the mechanism is to make sure there is consensus on which block should be the official (final) next block.
<!-- Note: this could make readers think that the head block is always final, or becomes final before more blocks are added. We couldn't find a way to avoid this without making it too complicated. -->
## Consensus mechanism
We need some way for a next block to be proposed, based on some set of transactions, and some may for the community to agree that one such block is indeed the official, final (definitive) next block in the chain.
The mechanism used for this is called the **consensus mechanism**.
### Centralized production of blocks
As the goal of grouping transactions in blocks is to avoid exchanging messages for every transaction, proposing a next block is not something that can reasonably be done collectively.
One of the nodes has to take this responsibility, select a set of transactions among the ones it has received since the last block, define an order for these transactions, and group them in a block. This block can then be propagated through the p2p network, and the process of having the community agree that this should be the next block, can then take place.
On Tezos, we call the entity that creates a block a baker. On other blockchains, it can be called a miner, or validator.
As one entity, a baker, gets to create block, this implies some centralization, with the issues that come with it. In particular, the baker may:
- censor some transactions
- give priority to other transactions
- entirely fail to do its job and not create any block
For the first two points, it may be for a number of reasons, such as favoring entities it likes, or simply censor or give priority to transactions depending on whether it benefits from it.
To avoid the issues that come with this centralization and the power a baker has, the solution is to distribute this power and responsibility over time: **for each new block, we select a new baker**.
### The difficulty of selecting a baker
For each block, a baker has to be somehow selected, so that the responsibility of creating the next block is not concentrated on one entity, or on a small number of entities.
Think about what algorithm you would you use to select a baker for each block, so that this power/responsibility is spread fairly among the community, thus ensuirng decentralization?
A natural answer would be to randomly select among all the different available bakers, and spread the responsibility evenly. Unfortunatly, this doesn't solve the problem. Can you find out why?
The issue is that as a blockchain is public and permissionless, we need anyone to be able to become a baker, and participate in ensuring the security of the blockchain. But if anyone can be a baker, this also means anyone can be multiple bakers. If there are N bakers, then one entity registers as N new bakers, then if we select the next baker randomly among available bakers, this single entity would have one of its bakers selected for half of the blocks on average, and gain too much power.
Making sure every baker is a unique entity, and that several such entities are not in practice controlled by the same entity, can't be done without a centralized authority in charge of investigating and controlling who can be a baker. This means we can't use a solution that depends on finding out who is behind each baker.
The solution needs to be based on something that entities can't freely create an infinite amount of: some kind of resource.
### Proof of Stake
One common resource an entity can't create an infinite amount of is money.
On a blockchain, we can conveniently use the native cryptocurrency, as a way to represent this limited resource. It can be converted from and into other currencies through exchanges.
This gives us way to select the baker for the next block: instead of selecting them randomly and evenly among all registered bakers, we can select them randomly, but in proportion to the amount of native cryptocurrency they posess.
As owning a high proportion of the available Tez would be prohibitively expensive, it is unlikely that a single entity would take too much control this way.
Approaches based on how much cryptocurrency an entity has, as a way to distribute the responsibility of creating the next block is called *Proof of Stake*.
This is the approach used by most modern blockchains.
Tezos was one of the first blockchains to use Proof of Stake as the way to assign block creation. Since then, many blockchains adopted this approach.
<!-- add link to a more detailed page -->
### Proof of Work
Another approach that was selected initially by some blockchains, including the first blockchain, Bitcoin, is to use computing power as the limited resource used to select who gets to create the next block.
The idea is that to have a chance at being the one to produce the next block, you have to perform a large amount of computation. The more computation you can perform in a given amount of time, the more likely you are to produce the next block.
For similar reasons to money, having a high proportion of the computing power dedicated to a blockchain can be very hard and costly to achieve, so this can be effective as a way to distribute the ability to create blocks among many entities.
An example of computation that is being used by some blockchains for PoW, consists in, given some data such as the content of the block being produced, to find a value for some extra piece of data to be included, such that the hash of the resulting data has a certain property. For example, the goal can be to obtain a hash that has its last N bits set to 0. As there is no way to control the output of a cryptographic hash function, other than trying different inputs and computing the output for each of them, the value of N directly impacts the average number of hashes that need to be computed, in order to get the required property.
Proof of Work has a very significant drawback: to prove that an entity has significant computing power, this computing power has to be spent, and it is typically spend in a way that has no other benefits. This causes waste of huge amounts of computing power and electricity, with corresponding effects on the environment. A [study](https://www.jbs.cam.ac.uk/2022/a-deep-dive-into-bitcoins-environmental-impact/) from the university of Cambridge estimated that in 2022, the largest PoW cryptocurrency, produced about 0.3% of global annual greenhouse gas emissions.
This is the main reason why modern blockchains tend to abandon PoW in favour of PoS as a way to distribute the block creation power.
Note that some very minor of PoW can be used in PoS blockchains, as a way to prevent DOS attacks.
### Incentives
As being a baker (or miner) means taking the time to setup and administer one or more servers, and in the case of PoW blockchains, spending significant amount of electricity, there needs to be some incentive to encourage as many people as possible to take (and share) this responsibility.
Being a baker for a given block can give some amount of power. However, as everything is done to limit this amount of power, and taking advantage of this power is not easy, some other, more direct incentive is needed.
A system of rewards is therefore put in place: whoever creates the next block, will be rewarded in the native cryptocurrency of the blockchain.
This reward can be composed of:
- some amount of newly minted cryptocurrency
- fees paid by the emitter of transactions
TODO: rewards incentivize bakers to behave well:
- failing to produce blocks when it's your turn
<!-- TODO: add more details? -->
### Slashing
As being in charge of creating the next block is a responsibility, bad behavior has to be deterred:
- producing multiple blocks in a single round
- attesting or pre-attesting multiple blocks in a single round
When using Proof of Stake, instead of simply considering how much bakers own, we let them put some amount of cryptocurrency at stake, temporarily locked, so that if they are proven to badly behave, some of these funds will be taken from them as a punishement. This is what we call slashing
<!-- TODO: improve -->
### Delegation
<!-- todo: Move to the Tezos part -->
To increase decentralization, and as not everyone has enough funds that they can lock in their account, or the ability to setup a server and bake, we need a way for more people to participate in securing the blockchain without these risks and responsibilitues.
On Tezos, a system of delegation is available to fulfil this purpose: anyone who owns some tez can **delegate** their tez to the baker of their choice. The delegated tez are added to the stake of the baker, and contribute to increasing their chance of being selected for the next block.
As delegated tez are not locked and not at risk of being slashed, only a limited portion <!-- TODO: add current value --> of the tez for a given baker can come from delegation.
In exchange for the tez delegated to them, bakers in turn, share a portion of their rewards among delegators. Each baker chooses the rules on how much and how they do so, creating a competitive market between bakers.
## Smart Contracts
Once we have a system for a P2P network of nodes to agree on a sequence of transactions to execute in a specific order, we can apply this system to all kinds of transactions.
### Virtual machine
If we push this to the extreme, we could imagine expanding transactions to anything a computer could possibly do, in other words, any piece of code. This would require for nodes to agree on a format to represent code (some programming language), and on an API, a defined set of functions that code can use to interact with its environment.
Of course, the type of code that any user may send in a transaction to be executed on every node has to be limited.
Try to think of what limitations would be required, before you continue.
Here are the main limitations required for this to work:
- a given transaction needs to have the **exact same effect** on every node, as their state need to be identical after each block, no matter where or when the execution happens.
- the **execution time** of a transaction needs to be limited, so that the whole network doesn't become slow
- the effect of a transaction sent by a given user has to be **restricted to what other users agree** they are allowed to do
This means all the code that will be run needs to be executed in some type of sandboxed **virtual machine**: a well defined, limited environment, with no access to the outside world. Indeed, the "outside world", whether it's some external service or content on the machine or online, may be different from node to node, so accessing it could produce different results depending on the node or on when the execution takes place.
What's left is code that interact with the data stored in the node itself. This can include not only the balances of accounts, but other types of data such as information about NFTs and their owners, scores of games, references to pieces of art, etc.
### Smart contracts and calls
For users to agree on what can be done without limiting use cases, the execution of code is divided in two steps:
- Storing pieces of code that are made publically available.
These are what we call **Smart Contracts**. They store their own dedicated data that can only be modifierd by them. The code of a contract may only modify this data, or emit its own new transactions (transferring tez or calling other smart contracts).
- Calling these smart contracts with given parameters, wich executes the code.
These are two new types of transactions on Tezos: **contract originations** (we also talk about deployment), and **contract calls**.
With this approach, someone can create a set of rules in the form of a smart contract, as to how its data is modified depending on who calls them with what parameters, then potential users can check the code of the contract and make sure they agree with its behaviour, before they call them.
A smart contract may have multiple entrypoits, like the functions or methods of the contract, each with their own parameters and code.
When we call an entrypoint, we can pass our own parameters. Other information, such as who is making the transaction, how many tez they sent to the contract along with the call, or what the current date is, can be accessed to the code, along with the contract's own storage.
### Smart contract example
Here is a very basic example of smart contract:
Storage:
- owner: an address (of the current owner)
- price: a number of tez
Entrypoints:
- purchase()
- check that the amount sent by the caller equals the price
- send this amount to the current owner
- change the owner to the address of the caller
- setPrice(newPrice)
- check that the caller is the owner
- set the price to newPrice
This contract in itself is a digital property that people can purchase or sell. A contract like this could also be used to authenticate the ownership of an associated physical item.
Once deployed, this contract can never be changed. This means that when a user calls the purchase entrypoint and send money, they can be certain that if the execution takes place, they will become the new owner of the contract, and be able to then set the price at which they are willing to sell it later.
Only the owner can change the price, and noone can ever prevent them from selling the ownership to whoever wants to purchase it.
Smart contracts open blockchains to all kinds of use cases, decentralized finance, art, gaming, decentralized identity and more.
### Smart contract flaws
Talk about the risk of bugs
## Governance
- upgrades
Already in other section. Do we still mention it?
<!--
## Use cases
TODO: summarize that we can use it to do simple transactions or execute smart contracts, and store data...
Some examples of the main current use cases:
- Cryptocurrency
- Art
- DeFi
- Tokenization of assets
- Gaming
TODO: add one sentence for each topic
## P2p (old)
Connecting users throuh a p2p network of nodes
Before users can agree on anything, they need to be able to communicate with each other.
Users who want to take part in maintaining the infrastructure of the blockchain itself will need to run software that implements the rules of the blockchain (the protocol) on one or more computers, that will communicate with computers of other users doing the same.
We call each running instance of this software a *node*. Each node maintains a copy of the current state of the software.
Nodes communicate with each other through a peer-to-peer network (p2p), also called the *gossip network*. In this network, each node knows a small but diverse subset of other nodes and can exchange messages with them. Messages typically include propagating new operations sent by users, requesting or propagating the content of blocks, and more.
Users interact with the blockchain by communicating with a node to read data from the blockchain, and write data by sending new operations. If valid, these new operations will then be propagated by the node to its neighbors, then their neighbors, etc. until all the nodes of the p2p network received them.
Check [this chapter](TODO) to learn more about the P2p network.
## Structure of a blockchain
The data of a blockchain needs to be available to the whole community. Since we can't have a small subset of users control it, we need many members to maintain copies of this data.
The main problem that needs to be addressed is how to collectively and efficiently maintain this data so that the community agrees on the exact same version. Any member should have access to it and be able to trust that what they have is the same official data that the rest of the community agreed on.
### Rules on how users can make changes to the data
The data in a blockchain could be seen as a simple very long sequence of bytes, where users can make changes by inserting, removing or changing some of these bytes, according to some rules agreed upon by the community.
In practice, the most common type of data stored on a blockchain is the heart of the cryptocurrency associated with it: information on who owns how much. Each user has an account, and the changes they are allowed to do to the data consist in performing a financial *transaction* from their account to another user's account: they substract some value from the balance of their own account, and add the same value to the balance of the other user's account, with the added rule that the balance of their account can't be negative after this operation.
As an example, if the data currently stores that Alice owns 100 tez and Bob owns 20 tez, then the rules allow Alice to perform a change that consists in substracting 30 tez from her account and add exactly 30 tez to Bob's account. On the other hand, these rules would present Bob from performing the same change himself, as only Alice is allowed to transfer tez from her account.
TODO: insert illustrations representing the data and the change with before/after
The preoponderance of the use of blockchain for such financial transactions is why the content of a blockchain is often referred as a *ledger*: a log of all financial transactions between a set of accounts. A blockchain like Tezos supports many other types of data, with operations on this data that follow more complex sets of rules. This includes calls to smart contracts, account activations, governance operations, and more.
### Append-only datastructure
Sending the full content of the shared data every time one of the users makes a change would be prohibitively slow and costly. Blockchains can manipulate millions (eventually billions) of accounts and store terabytes of data, so we can't reasonably transfer the entirety of every new version to the whole community.
The key idea to solve this is to **only transmit changes** made to the data. Such changes are expressed as *operations* that are applied to the data. The financial *transactions* performed by Alice when she transfers some tez from her account to Bob's account, is one simple example of *operation*.
With each change expressed as an *operation*, agreeing on what changes should be made to the data means **agreeing on which operations to apply, and in which order**. The order is extremely important, as transactions could become invalid if they are performed in the wrong order. In our example Bob may transfer 50 tez from his account to Carl's account only after Alice's transfer has been applied.
Given the full list of operations applied since the beginning, the full content of the current version of the data can be computed by anyone: start with the initial state and apply every single operation in the correct order.
As all users can do is add new operations to the chain, we say that the blockchain is an *append-only* datastructure.
### Grouping operations into blocks
As we will see later, the process of having the community agree on a specific change to the data is complex and necessitates exchanging many messages across the network. It would be too slow and costly to apply it for every single operation that is applied.
To solve this, operations are grouped into batches, so that the community only needs to agree on an entire batch of operations at once. Such batches of operations are called *blocks*. A blockchain is therefore an ordered chain of blocks, where each new block added to the end of the chain contains a new sequence of operations to apply to the data.
Blocks are added to the chain at a regular pace, expressed in seconds. To make sure members have enough time to process each block before receiving the next one, constraints limit the amount of data to transfer and the amount of computing time needed to process all the operations within a given block.
Old version :
The ledger's structure has to be very special to meet the following constraints:
- The ledger is distributed over the planet, and everyone should be able to agree on its state at the same time (minus latency): This is an asynchronous network. Transactions (transfers of bitcoins) are grouped inside packages named "**blocks**" to ease management. The size of the blocks impacts the transmission speed. If blocks are relatively small, more blocks circulate on the network. If blocks are relatively big, fewer of them circulate in the same amount of time. Almost every node (computers) of the network has to check each block for roughly the same time. So, in an asynchronous network, to assure maximum participation in the reconciliation ([the finding of the consensus on the replicated, shared, and synchronized digital information](/blockchain-basics#terminology)), the size of a block is a key factor. Finding a good size allows for smooth and regular transmissions.
- The ledger's history of transactions must not be modifiable (immutability)
- Verifying the history or picking and verifying the presence of specific information inside the ledger has to be fast (e.g., check a balance)
- The part of the community validating the blocks (i.e., the "*miners*", as we will see) must be rewarded in a fair way
The data structure which permits all of the above is a chain of blocks, a.k.a. "blockchain".
Valid transactions are grouped and enclosed inside a block. On average, every 10 minutes with Bitcoin, a new block should be mined. The number of transactions inside a block is only limited by the available space, which is currently (2021) around 2 MB per block (comparatively, the "_Bitcoin Cash_" blockchain has a block size of around 32 MB).
Each new block is linked to the previous one: they are chained. The more blocks there are, the more difficult it is to modify anything in the ledger. They are cryptographically chained. This means that if you want to cheat (e.g., make a double-spend or spend money you don't have), you would need to modify everything from the first block ever created (called the "_Genesis Block_").
The reference to the previous block is inside the new block's *header*. This reference is made with the *SHA256* [hash function](https://en.wikipedia.org/wiki/Hash_function) applied on the previous block's header (more on that in the [next chapter on Proof-of-Work](/blockchain-basics/proof-of-work)).
This process is achieved through the [Cipher Block Chaining (CBC)](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_block_chaining_(CBC)) principle from 1976, where instead of a _XOR_ function (plain text additions will match [*nonce*](https://en.wikipedia.org/wiki/Cryptographic_nonce) additions in the next chapter), the _SHA256_ function is used. Twice:

### Chaining blocks using hashes
We use the hash of the data within a block as a virtually unique identifier for this block. This makes it impossible for anyone to produce a fraudulent block with operations that were not agreed upon by the community. Indeed, anyone receiving the content of a block can compute the hash of its data and compare it to the expected hash. As any change of the data would produce a different hash, getting the correct hash is a proof that you obtained the correct corresponding data.
To make it part of a chain, each block containing a sequence of operations references the previous block in the chain by including its hash.

<small className="figure">FIGURE 3: Chain of blocks linked through their hashes</small>
TODO: replace with BLAKE2B hashes
The header of a block contain other information, that we ignore for now.
## Agreeing on what operations to include
In the previous section, we have presented how the data of a blockchain can be structured, so that only batches of changes need to be transferred between different users, rather than the new data itself.
In this section, we introduce the mechanisms that make it possible for users to collectively agree on **which changes are made, and on in which order**.
Note that this is not about deciding if the community likes a proposed operation and would like to include it or not: any operation that follows the rule should eventuelly be included. The agreement is about making sure that all users are applying the same operations to the data, in the same order.
### Connecting users throuh a p2p network of nodes
Before users can agree on anything, they need to be able to communicate with each other.
Users who want to take part in maintaining the infrastructure of the blockchain itself will need to run software that implements the rules of the blockchain (the protocol) on one or more computers, that will communicate with computers of other users doing the same.
We call each running instance of this software a *node*. Each node maintains a copy of the current state of the software.
Nodes communicate with each other through a peer-to-peer network (p2p), also called the *gossip network*. In this network, each node knows a small but diverse subset of other nodes and can exchange messages with them. Messages typically include propagating new operations sent by users, requesting or propagating the content of blocks, and more.
Users interact with the blockchain by communicating with a node to read data from the blockchain, and write data by sending new operations. If valid, these new operations will then be propagated by the node to its neighbors, then their neighbors, etc. until all the nodes of the p2p network received them.
Check [this chapter](TODO) to learn more about the P2p network.
### Proposing and signing new operations
Nodes can't simply let anyone send them all kinds of new operations that will then be propagated through the whole network, otherwise it would be easy to saturate the network with a flood of operations.
Two main features are used to protect the network against this type of DoS (Denial of Service) attacks:
- authenticating users through the use of signed operations
- financial incentives, through the application of fees
When transmitting a new operations to a node, users need to digitally sign the operations, as a way to prove that they are indeed the author of the transaction. Digital signatures through the use of assymetric cryptography is another cryptographic primitive that is used heavily in blockchains. Each user publishes a public key to the network when they activate their account, then digitally sign every operation they send to the network, using their corresponding private key. These operations are done through the use of a wallet, a piece of software (and potentially associated hardware) that takes care of holding the user's cryptographic keys, and of using them to sign operations.
Signing operations only helps identifying who is the author of the transactions, and to apply corresponding rule, as for example, a transaction transferring tez from Alice to Bob is only valid if it has been signed by Alice.
To prevent users from sending millions of transactions, a system of financial incentive is added: when sending a new operation to the network, users need to pay a fee, in the native cryptocurrency of the blockchain (tez on Tezos). We will see later how the value of this fee is computed on Tezos, as it depends on several factors such as the amount of computation needed to execute the operation. For now, simply remember that users need to pay to send new operations, which makes it too costly to saturate the network for significant periods of time. Note that the fee will only be paid for operations that are eventually added to the chain.
### Collecting operations to create the next block
We have seen how signed operations are sent by users to node that then propagate them through the p2p network.
Before these operations can make it into the chain, they need to be included in a new block. A decision needs to be taken on which of these transactions should be included in the next block, and in which order.
Taking such a decision collectively would be complicated. Instead, a system is set up that identifies a single entity that will be in charge of collecting operations received through the p2p network, selecting some of them, and crafting a new block that includes these selected operations in a specific order.
Once the block is ready, it can then be propagated to all the other nodes through the p2p network, and each node that receives the block can then apply all the operations to compute the new state of the data.
On Tezos, entities that create blocks are called Bakers. On other blockchains, they are often called Miners.
### Basics of the consensus mechanisms
Two key issues remain to be addressed:
- how to decide which entity gets to create the next block
- how do we make sure all nodes receive the exact same block?
Being selected to create the next block is a big responsibility but also comes with some power and financial rewards, which means many entities volunteer to take that role for every block.
When deciding who gets to create the next block, the most important aspect is making sure that this responsibility is **spread fairly across the community**, to make sure a small number of entities don't accumulate too much power. In particular, if the same entity gets selected for many consecutive blocks, it gives it the possibility to block some users from adding their own operations to the chain. This is something we absolutely want to avoid in a decentralized blockchain.
Different blockchains use different methods to select the next baker or miner, while making sure this opportunity is spread fairly across the community. Simply selecting a baker/miner randomly across all volunteers wouldn't work, as nothing prevents anyone from creating a large number of bakers/miners, to increase their chance of being selected. Instead, blockchains usually distribute the responsibility proportionally to some limited resource that is distributed between a large number of entities.
For most modern blockchains this resource is based on the amount of native cryptocurrency that entities are willing to put at stake. This system is called Proof of Stake, and Tezos was one of the very first blockchains to use it. Another well known approach, still used by some older blockchains such as bitcoin, is based on the amount of electricity and computing power that entities are willing to spend (and usually waste) on arbitrary computations, in order to get a chance to be selected as the next miner. This approach is called Proof of Work. To learn more, check this chapter](TODO).
In some situations, two or more blocks can be created and compete to be the next block to add to the chain. On Tezos, the Tenderbake algorithm applies a system of endorsements of blocks by the majority, where a block is considered approved if it received a majority of endorsements, and multiple rounds can take place if no block initially receives such a majority. On some other blockchains, several blocks may coexist for a while and form multiple branches on the blockchain, but a mechanism is applied that converges to one of the branches as being the official branch, after a few more blocks.
## Block
A block is simply a structure containing:
- its block number, timestamp and other metadata
- the list of the operations
- the signature of the block producer
- and the hash of all the above
Fig. 3 shows an example of a block. Notice that the hash of the block starts with "_d8ca_". So far, this block is considered _invalid_. There are different rules for validating a block depending on the chosen blockchain. Most blockchains use *Proof-of-Work*, which consists in proving that validation's work has been done on the block.

<small className="figure">FIGURE 3: The hash of DATA, NONCE and block number does not start with 000: The block is invalid.</small>
For example, a widely used validation rule is that the hash must start with _000_. Therefore, the block will have to be "mined"; that is to say, it will be necessary to find a suitable _NONCE_ such that the hash of the block (including the _NONCE_) starts with _000_. Note that this problem is arbitrary. Other consensus mechanisms choose to use a hash starting with _123_ or that the hash in base 10 is lower than 1000 (in the case of Ethereum). You just need a rule that proves that a validation work has been done on the block and that allows for the finding of one validator in the world (i.e., the first one to find a valid _NONCE_). When the rule is validated, the block is then valid.
In fig. 4, we have repeatedly incremented _NONCE_ and calculated the hash of the block until we got a hash starting with "000". The duration of this process can vary significantly because each NONCE is *equally probable*. In this case, we have incremented the _NONCE_ up to 29258. The block is now valid.

<small className="figure">FIGURE 4: The hash of DATA, NONCE and block number starts with 000, the block is signed</small>
<NotificationBar>
<p>
As a miner, you might want to try out random NONCE instead of incrementing it. As everyone in the world is in competition, anyone with a faster machine would always beat you to it. But when using random guesses, your chances are proportional to the amount of _hashing power_ you have compared to the rest of the world.
</p>
</NotificationBar>
Therefore, mining consists in calculating the hash of a block repeatedly until the validation rule is validated. This is why it is possible to create _ASICs_ (Application Specific Integrated Circuit) optimized for mining. For Bitcoin, ASICs integrate chips specifically designed to do only SHA256 and to find _NONCE_ very quickly [[5]](/blockchain-basics/proof-of-work#references).
Finally, note that if you try to modify anything in a validated block, it loses its validity because you change the hash of the block, and you would have to mine the block again to find a new hash starting with "000".
## Blockchain
You now have all the elements to understand what a blockchain is. It is simply a list of blocks where each block contains a reference to the previous block: a hash. They are, therefore, "chained". In fig. 5, you can see that block #2 contains all previously mentioned information in addition to _PREVH_, the hash of the previous block, i.e., the hash of block #1. To validate block #2, you have to mine it to find a hash starting with "000"; then this hash is used in "_PREVH_" in block #3, etc.

<small className="figure">FIGURE 5: A valid blockchain (all blocks are signed)</small>
If we change anything in a block, this block and all the following blocks lose their validity. Why? Because, if I change "good" by "bad" in block #2, the hash of block #2 changes, which changes "_PREVH_" inside block #3, changing its own hash; would in turn changes "_PREVH_" in block #4, changing its hash, and so on... This invalidates the whole blockchain after block #2. Therefore, it is straightforward to know if any information on a blockchain has been changed and in which block.

<small className="figure">FIGURE 6: An invalid blockchain (some blocks are not signed because of the data modification)</small>
Now you may be wondering: why not *re-hash all the invalidated blocks*? It is indeed possible to re-mine these blocks from the last valid block and revalidate the entire chain (see fig. 7).

<small className="figure">FIGURE 7: A valid blockchain again (all the blocks following the modification have been signed again)</small>
So can the blockchain be altered? No. Why?
- Firstly, because mining a block requires a lot of computing power. For Bitcoin, it would take several **years** for your desktop computer to mine **a few blocks**, and the more you go back in time and modify an old block, the more blocks you would have to mine. Remember the previous chapter on data structure [here](/blockchain-basics/main-properties#chained-data-structure).
- Secondly, *and most importantly*, the distributed nature of the blockchain makes it statistically impossible to rewrite its history: You would need to control more than half of the BFT network and go against the MAD property (see [previous chapter](/blockchain-basics/main-properties#agreements-and-deflation)).
## Transactions' immutable history
TODO: rewrite to explain transactions vs context, accounts, verifications...
When we talk about "blockchains", we do not necessarily talk about "cryptocurrencies". "Blockchain" has been used for many non-financial applications [[10]](/blockchain-basics/proof-of-work#references). Note that until now, the data stored in the blocks of our examples were simple strings. You can store any type of data: identities, electronic documents, insurance contracts, etc. Whenever it is necessary to have an **immutable record**, or a system of secure exchanges between parties without trust, or on an unsecured network, you should ask yourself whether or not a blockchain can and should be used.
In the case of a crypto-currency, we use the "DATA" field to store financial transactions, among other things. Figures 10 and 11 respectively show a block and a blockchain with transactions in "_TX_" instead of "_DATA_" (e.g., _$13 from John to Chris_). By _"\$"_, we denote a monetary value that is not necessarily dollars, but which could be any "coin" or "token", such as Bitcoin (BTC), Ethereum (ETH), etc.

<small className="figure">FIGURE 10: A block containing financial transactions (this is a simplified representation)</small>

<small className="figure">FIGURE 11: A blockchain containing financial transactions</small>
However, in our example, how do we know that John has enough money to send Chris $13?
The Bitcoin blockchain does not contain a ledger showing the balance of each account at all times. Instead, when John attempts to complete a transaction, the process will check history on the blockchain and calculate the difference between his *inbound* transactions against all of his *outbound* transactions and deduce how much money John can spend.
## Keys inside Transactions inside Blocks
It is essential for the proper functioning of the crypto-money that only Chris can send the transaction "$40 from Chris to Jane". For that, we need to use one of the bases of modern cryptography, the [*Public Key Cryptography*](https://en.wikipedia.org/wiki/Public-key_cryptography) or [*Asymmetric Key Algorithm*](https://en.wikipedia.org/wiki/Public-key_cryptography), which consists in a private key and a public key.
Fig. 13 shows a pair of keys. The private key is a very long, randomly generated number (you could create that number yourself by flipping a coin randomly a large number of times). A public key is a hexadecimal number that is calculated from the private key. It is possible to calculate the public key from the private key, but it is practically impossible to find the private key from the public key.
As the name suggests, **the private key must be kept private**. You should **never** share it with anyone. On the other hand, the public key can be accessible to anyone that wants to send you money (but keep in mind that this public key can then be linked to your data. You should use other generated public keys whenever possible).

<small className="figure">FIGURE 13: A private key that has been randomly generated and its associated public key computed from the asymetric key algorithm</small>
## Signatures inside Transactions inside Blocks
Now what's great with a **private key** is that you can sign a message that can be **authenticated** using only your **public key and signature**. Indeed, fig. 14 shows first a signature generated by our private key for the message "I like cake!". Notice that if we change the message, the signature changes.

<small className="figure">FIGURE 14: Signing some data with a private key</small>
Let's send our message and signature to someone. That person can verify the authenticity of the message simply by finding our public key and applying the *cryptographic verification algorithm* on the message and its signature.
Remember that the person **does not have access to the private key**, but using only the public key, the cryptographic algorithm will tell this person (see fig. 15): "Yes, this message has been written by the person that owns the private key, and the message has not been altered in any way".
Or, in case of attempted identity theft or corrupted message (see fig. 16): "No, the signature is invalid, meaning that either the message has been altered or it's not the person that owns the corresponding private key that signed this message".

<small className="figure">FIGURE 15: Thanks to the public key and the signature, anyone can verify that this data has indeed been sent by the holder of the private key associated with this public key...</small>

<small className="figure">FIGURE 16: ... Or inversely, that the data has been altered or does not come from the holder of the private key associated with this public key.</small>
## Transaction Data
Instead of using a simple string of characters in the "DATA" field, let's use transactions and public keys. So, from now on, note that instead of using names, our transactions will use _the **public key** of the sender_, and _the **public key** of the recipient_. Instead of "\$ 50 from Chris to Jane", we now have "\$ 50 from 0x4cf6... to 0x2f1f...".
The sender then signs the transaction with his private key. See fig. 17 as an example.

<small className="figure">FIGURE 17: We sign our transaction with our private key</small>
Miners can now verify that the private key owner has indeed sent a transaction and that the amount and recipient have not been altered by any third party (fig. 18). Otherwise, the signature would be invalid.

<small className="figure">FIGURE 18: Miners verify that this transaction has been sent by the owner of the private key</small>
<NotificationBar>
<p>
Note that transactions do not actually use public keys in the _from_ and _to_ fields. They use **addresses**, which are hashed versions of the public keys. Because the public key is made up of an extremely long string of numbers, it is compressed and shortened to form the public address. That way, it is more easily readable and more secured as nobody can know your public key from your address.
To sum up, the private key generates the public key, which, in turn, generates the public address.
</p>
</NotificationBar>
## Complete blockchain
Let's now modify our "insecure" blockchain diagram fig. 10 and add the signatures of each sender to their transaction.
If a hacker tries to change anything, e.g., the value of the amount of a transaction, two things happen:
- the block is no longer valid because the hash has changed, as seen previously
- the signature is no longer valid
The hacker could mine the block again to make it valid. However, he has no way of signing the transaction again without the private key of the sender.
This is how transactions in the blockchain are protected by only allowing the sender to sign their transactions, and it works perfectly.
To recap, here is a complete schema of a block, a blockchain, and a blockchain network.

<small className="figure">FIGURE 19: A complete block</small>

<small className="figure">FIGURE 20: A complete blockchain</small>

<small className="figure">FIGURE 21: A complete and distributed blockchain network</small>
## Conclusion
You can now understand why a PoW blockchain works so well for a fully secured system without the need for a bank or any centralized entity. All you need is a random number to create a private key and then a public key, and start receiving money. Note that public keys are pseudonymous and not anonymous: compare them to bank accounts numbers. We can still associate them with your official identity at some point. For instance, you will need to associate your public key or address with your identity documents if you want to exchange a cryptocurrency for fiat and transfer some money to your bank account.
## References
[1] https://blockstream.com/satellite/
[2] https://bitcoin.org/en/release/v0.12.0#wallet-pruning
[1] https://andersbrownworth.com/blockchain/
[8] https://academy.binance.com/en/articles/what-is-a-51-percent-attack
[9] https://medium.com/hackernoon/the-history-of-51-attacks-and-the-implications-for-bitcoin-ec1aa0f20b94
[10] https://builtin.com/blockchain/blockchain-applications
[11] https://www.blockchain.com/charts/difficulty
--------------------------------------------------------------------
# À déplacer vers P2P
## Peer-to-Peer network and shared ledger
Developers have a lot of power, but their code still has to be accepted and used. The Bitcoin P2P network, like most blockchains, has a mesh design spread all over the planet (and space[[1]](/blockchain-basics/main-properties#references)). The more nodes enforcing the rules, the more the protocol is distributed and secured.
There are different types of nodes, but for the sake of simplicity, let's only quickly describe two categories: _Full nodes_ and _Lightweight nodes_.
- _Full nodes_ enforce the rules no matter what happens and validate the transactions. They _usually_ [[2]](/blockchain-basics/main-properties#references) also record all transactions in a distributed ledger. This ledger is shared by all the full nodes of a network.
- _Lightweight nodes_ are used for devices with limited space capacity, calculating speed, or connectivity (e.g., smartphones, tablets, IoT, etc.). They don't record transactions in the ledger but ask the full nodes for required information.
From now on, "node" refers to a *full node*. All these full nodes communicate with each other using a [*Gossip Protocol*](https://academy.bit2me.com/en/what-is-gossip-protocol/).
## Distributed
Any blockchain needs to be distributed to be secured; that is to say that all the valid blocks have to be replicated on enough network nodes.
Like everyone else, you can use your computer as a node and mine with it. To do this, you need to download all the valid blocks so far. For Bitcoin, this already represents more than 330GB [[6]](/blockchain-basics/proof-of-work#references).
When any node in the world validates a new block (first to find a valid _NONCE_), **it is added to the blockchain of all the other nodes using a _gossip protocol_**[[7]](/blockchain-basics/proof-of-work#references), **so every node has an up-to-date blockchain**. There are many nodes in the world, and they all have a complete copy of the blockchain.
Let's consider what happens when a node decides to fraudulently modify a block. Fig. 8 shows a network of 3 nodes (_A_, _B_, and _C_), which all have a copy of the blockchain. If "_C_ " decides to modify some data on the blockchain, this can be seen immediately by the other nodes (_A_ and _B_), because the hash of the last block in their chain (_000969..._) is different from the hash of the last block of "_C_" (_dec59db..._).

<small className="figure">FIGURE 8: The blockchain is identical on all the nodes of the network except when a hacker tries to modify it. We see here that node "<em>C</em>" tries to modify the data of block #2</small>
Even if node "_C_" *validated all of its blocks again* as shown in fig. 9, the final hash (_0004de..._) is still different from the other nodes (_000969..._). There is no way to change the data of a block while preserving the same final hash as the rest of the network. The incorrect blockchain no longer corresponds to the majority of the other nodes. This block will become an orphan and will not be integrated into the general ledger.

<small className="figure">FIGURE 9: Even if "<em>C</em>" re-mines all its block following its modification, the hash of the latest block still does not match the rest of the network.</small>
The only known way to corrupt Proof-of-Work is through the infamous "51% attack" [[8]](/blockchain-basics/proof-of-work#references), which consists, for an attacker, to obtain more than 50% of the world's mining power, allowing him to rewrite the history (the technical details are explained [here](https://hackernoon.com/ethereum-classic-attacked-how-does-the-51-attack-occur-a5f3fa5d852e)).
Fortunately, getting 51% of the world's mining power for a popular blockchain is very difficult. For instance, it would cost several billion dollars in the case of Bitcoin. However, for less popular blockchains (with fewer nodes), this is quite doable and happens quite regularly [[9]](/blockchain-basics/proof-of-work#references). Some blockchains are trying to solve this problem by implementing different consensus algorithms. This is the case for Tezos, as we will show in the following chapters.
--------------------------------------------------------------------
# Concerne Bitcoin ou Proof of Work
## Intro
_Proof-of-work_ was the first fully functional blockchain consensus mechanism ever created. It is still in use by Bitcoin and many other blockchains. It requires its users to _mine_ to get a chance to earn a reward for validating blocks of transactions. In this chapter, we will look into the technical side of things and how mining works.
## Coinbase & Mining
You now understand how the blockchain can calculate each person's balance and whether or not to authorize their transactions. However, where does all this money come from in the first place? If we were to completely trace the blockchain all the way back to its first block, there would be a point where the money had to be created. If the users' balances are only calculated from transactions between users, there would be no creation of new tokens. Therefore there would be 0 bitcoin.
New bitcoins are generated by the blockchain itself when a new block is validated by Proof-of-Work. New tokens are created and given to the miner of the block. In fig. 12, assume that Chris is mining the block. You can see that $100 are created from scratch and given to Chris to thank him for investing his computational power and electricity into mining a block. This is called the "_reward_" or "_coinbase_" ("*coinbase reward*" and "*coinbase transaction*" are also used). Chris can then spend this money in the next block as if he had received it from another user. This process allows new tokens to be put into circulation in the same way that a central bank can print new banknotes. The advantage of a blockchain over a central bank is that the process is entirely autonomous, decentralized, and unalterable. Inflation on Bitcoin is fully known in advance, and it is not possible to have over-inflation. Currently, 17 million Bitcoin have been minted, namely offered as a coinbase. The maximum is 21 million, at which point the source code of Bitcoin shows that no more coinbase will ever be offered. The source of income for miners will then only be the transaction fees, but there is still some time as this will happen around 2140 due to the ever-increasing difficulty of mining [[11]](/blockchain-basics/proof-of-work#references). Indeed, the Bitcoin protocol states that a new block must be mined on average every 10 minutes. As the total computational power put into bitcoin increased with more miners and more technological advancements, the average time to mine a block naturally lowers. Bitcoin compensates for this by increasing the mining difficulty. E.g., instead of finding a _nonce_ such that the hash of the block starts with three zeros, it may increase it to 4 zeros, making the whole process much longer and more difficult for the hardware.

<small className="figure">FIGURE 12: Block A gives a coinbase of $100 to Chris, which allows him to spend it in the next block</small>
<NotificationBar>
<p>
Note that the first block has no previous hash. Thus the <em>PREVH</em> is set to a series of zeros. The first block of any blockchain is called the <b>genesis block</b>.
</p>
</NotificationBar>
There is still one big issue in our blockchain schema so far. Please take a minute and try to identify it.
[...]
Did you find it? Consider how the transactions are authenticated.
[...]
Knowing that Chris has a positive balance of \$100, could Jane add the transaction "\$ 40 from Chris to Jane" herself in a new block without Chris ever giving his consent? At this point of the chapter, anyone seems to be able to spend anyone else's money!
## About Energy consumption
You also now know that mining is nothing more than repeatedly trying random _nonce_ to solve a very arbitrary problem in the first place and obtain the coinbase reward. Everyone is basically in competition, and the more computational power and the electricity you use, the better your chances. This, unfortunately, has a detrimental effect on the environment. Bitcoin mining has recently passed 100 TWh [[2]](/blockchain-basics/proof-of-work#references) in energy consumption, more than entire countries like Switzerland (56 TWh) or Finland (84 TWh). All this just to find an arbitrary number that does nothing except selecting a person in the world to be the next validator. Isn't there a better alternative? Isn't there a consensus mechanism that would be more cooperative and less competitive? We're going to see in the next chapter a few examples of the so-called *"next generation" consensus algorithms* (including the current consensus mechanism used by Tezos) that work just as well as _PoW_ but with much lower energy consumption.
### Introduction to Mining
The blocks validators are called the **miners**. They put valid transactions inside a block, and then try to validate that block. When the block is found valid, the miner who created it receives two types of rewards:
- A fee in bitcoin for each included transaction, **chosen** and sent by the **author** of that transaction
- And a **pre-determined** quantity of newly created bitcoins for the valid block found (called the _coinbase_)
The fees are **the sum of the transactions values** ($S_v$) minus **the sum of the amounts sent** ($S_a$).
For $T$ *transactions* in a block, let $v$ be a transaction *value*, and $a$ be the *amount* really sent:
$$
\text{Fees} = \sum_{i=1}^{T} v_i - \sum_{i=1}^{T} a_i
$$
Or more simply:
$$
\text{Fees} = S_v - S_a
$$
Senders choose the fee they want to pay. The more they give, the faster the transaction is included in a block. As block size is limited, miners prioritize transactions with the highest fees. A transaction could have zero fees, but it would take months or years until a miner decides to include it in a block (if ever).
The block reward is sent through a particular transaction called a "_Coinbase Transaction_" directly to the miner. It's always the first transaction of a validated ("*mined*") block.
To get these two rewards, a miner basically plays a lottery, a game of chance. This game is to find a binary number lesser than a specified value (called "_Target_") with the specific hash function _SHA256_. The fact is a miner can't guess in advance the result of this function. He **must** try values (by [*Brute Force*](https://en.wikipedia.org/wiki/Brute-force_attack)). He must *hash* the previous block header (twice) with a random number called the *nonce* (details in the ["*Proof-of-Work*"](/blockchain-basics/proof-of-work) chapter).
The more a miner tries values (the more he has lottery tickets), the more energy he uses.
When a miner finds a valid number, he finally produces a valid block. His node spreads this new block to the network. The final *nonce* is then included in the block so that anyone can verify it.
If two miners find a valid number simultaneously (within lag and network propagation delays), then two valid blocks are possible. Miners are then split into two groups:
- those mining from the block of the first miner,
- and those mining from the block of the second miner
If someone finds a block following the second miner's block, then these two blocks form "the longest chain" (actually, the chain with *the most work*[[3]](/blockchain-basics/main-properties#references)) and become the "winners".
More details on that in the [next chapter on Proof-of-Work](/blockchain-basics/proof-of-work).
## Consensus: Nakamoto and the Proof-of-Work's account-units issuance
## Account Unit and Economy Basics
Miners use their computational power and electricity to mine. That's why they are rewarded for it. The rewards are in the account unit. Bitcoin uses bitcoins, while Ethereum uses ethers.
To get these rewards, they must _work_, and then _prove they have worked_. In Bitcoin, they work by using up electricity and computing time. So to make this activity profitable, miners have to find efficient calculators and low-cost electricity.
Over time, miners have grouped to use more and more powerful machines together as more people began this activity. This increases the number of attempts (the number of lottery tickets) of the global network to find a valid block. Because the function used for that is a **hash** function (_SHA256_), this network efficiency is called "**hashing power**".
While the network gains more hashing power, constraints stay the same. Each block still has to appear around every 10 minutes. To maintain this and adapt, the protocol calculates the _Difficulty_. If the hashing power is too high, the _Difficulty_ increases. If the hashing power is too low, the _Difficulty_ decreases.
The _Difficulty_ modifies the binary _Target_. To increase the difficulty, the _Target_ becomes smaller and smaller. This means that this binary number has more and more leading zeros.
A new _Difficulty_ is calculated every 2016 blocks (~2 weeks).
Economically, it becomes more and more challenging to get rewards. A sign that more and more people are trying to find bitcoins, so the *demand* increases.
Bitcoin's value relies on the simple free market of *supply* and *demand*. But its model is intrinsically **deflationary**. The more bitcoins are hard to mine, the higher their price.
The total *theoric* limited *supply* of bitcoins is pre-determined and hardcoded in the protocol at 21 million [[4]](/blockchain-basics/main-properties#references).
### What is a consensus?
"**Consensus**" comes from Latin *cōnsēnsus* ("agreement, accordance, unanimity").
It is the point where independent actors reach a common agreement on a decision.
In IT, a consensus algorithm is a computer program allowing users to reach common **agreements** on the states of data in a distributed network.
For example, consensus algorithms are used in Distributed Calculation, Distributed Database, etc.
### Agreements and *Deflation*
There are two more elements we need to introduce to have a better general picture of Bitcoin. The *Nakamoto Consensus* is both the engine and the binder of the Bitcoin components and relies on them. In the above sections, you learned about the Bitcoin open-source development, the distributed ledger and its network, and some basics about the mining and economics of Bitcoin. In the next chapter, you'll learn technical details about "*Proof-of-Work*". First, we need to know about the "*Halving*" and the "*MAD Property*":
Relying only on supply and demand, Bitcoin wouldn't be deflationary.
Another rule is coded in the Bitcoin protocol that increases the scarcity of bitcoins. Its name is the _Halving_. Every 210,000 blocks (~4 years), the block reward is simply cut in half.
At the very beginning (2008 / 2009), the block reward was 50 bitcoins. These days (2021), the reward is 6.25 bitcoins and will stay the same until 2024[[5]](/blockchain-basics/main-properties#references).
Block reward is how the currency's issuance is produced. So the _Halving_ has a strong economic impact. Usually, this event leads to a phase of increasing price, namely "Bull Market"[[5]](/blockchain-basics/main-properties#references). It's worth noting that it is also an event that federates the community, the adoption, and in return, makes the network stronger, and the price higher[[6]](/blockchain-basics/main-properties#references).
The numbers chosen by Satoshi Nakamoto for the total supply and the _Halving_ are inspired by _gold mining_. The more you dig to find gold, the less there is, and the harder it is to dig. That's precisely why the blocks validators are called the "Miners".
To recap, the _Nakamoto Consensus_, which Bitcoin is based on, is driven by:
- Decentralization
- Proof-of-Work and mining economics
- A probabilistic solution to the [Byzantine General Problem](https://en.wikipedia.org/wiki/Byzantine_fault) (a quick word on that below)
- The _MAD_ property (defined below)
In the Bitcoin mesh network, the **rewards** make it possible to sustain up to 50% of bad actors ($\frac{1}{2}$). The network uses "Byzantine Fault Tolerance" (BFT). Probabilistically, Bitcoin's solution is more efficient than the main mathematical solution: The actual main solution requires less than one-third of bad actors($\frac{1}{3}$).
The _**M**utual **A**ssured **D**estruction_ (**_MAD_**) property reinforces this BFT solution:
_It is more profitable to earn bitcoins by participating in the protocol than to attack it. If you want to attack it, you'd have to invest an unreasonable amount of resources. Even in the improbable case that your attack is successful, you'd still lose money on your investment..._
You would also face the community, which can detect the attack and adapt.
In conclusion, the distributed network's parameters allow for anyone to agree asynchronously on the data states. Anyone is also able to verify the rules or even code them through the open-source frame. The Proof-of-Work permits good actors to be miners, to prove their work and get rewarded for securing transactions. The probabilistic solution to the Byzantine General Problem and the MAD property assure the robustness of the network. The protocol evolves and adapts with the Halving and the Difficulty to mimic gold mining and assure deflation.
## What have we learned so far?
This chapter described some of the pillars of the Bitcoin protocol and how they are articulated around the Nakamoto Consensus. You now understand that the Nakamoto Consensus integrates the Proof-of-Work consensus algorithm and a form of *social* consensus based on simple Economy principles.
In the next chapter, we'll dig into the Proof-of-Work consensus mechanism to understand how it works in detail.
[3] https://learnmeabitcoin.com/technical/longest-chain
[4] More precisely 20,999,999,9769: https://en.bitcoin.it/wiki/Controlled_supply
[5] http://bitcoinhalvingdates.com/
[6] https://www.bitcoinhalving.com/
[2] https://digiconomist.net/bitcoin-energy-consumption
[3] https://en.wikipedia.org/wiki/List_of_countries_by_electricity_consumption
[4] https://ubuntu.com/tutorials/how-to-verify-ubuntu#5-verify-the-sha256-checksum
[5] https://www.bitmain.com/
[6] https://www.statista.com/statistics/647523/worldwide-bitcoin-blockchain-size/
[7] https://academy.bit2me.com/en/what-is-gossip-protocol/
## Open-Source
Most, if not all, of the public blockchain developments, are made open-source. Anyone can verify the code, correct it, and make proposals. This openness is fundamental in an essentially trustless environment. It is common to hear community members say: "*Don't trust, verify!*".
The most committed blockchain developers also go by the catchphrase "[*Code is law*](https://en.wikipedia.org/wiki/Code_and_Other_Laws_of_Cyberspace)", meaning that the strict code defines not only the validation rules of all the transactions and their interactions but also people's conduct. This code is also used to create the software that runs on the network and how data is recorded, explored, etc.
There are different implementations of the Bitcoin protocol on other operating systems and devices. A large variety of programming languages is used (e.g., _C++, Python, Java, Go, Scala_...).
The original implementation from _Satoshi Nakamoto_ is in C++ and is called [_Bitcoin Core_](https://bitcoincore.org/). Most of the nodes of the network use this version.
The openness of the code, the permissionless access to the network, the free software, all bring intoxicating freedom and appeal to the community. Note, however, that this induces a transfer of responsibility to the user. Not all developers in the community have good intentions, and many blockchain applications are pure scams. To prevent this, we encourage users to check the sources of applications and the blockchain they use.
-->