Introduction to Blockchain and Its Applications

# Introduction to Blockchain and Its Applications ## 1. Cryptography and Blockchain Technology The fundamentals of ==information security== and ==cryptography== that undergird blockchain technology - information security ：protection of data from unauthorized access or modification - Cryptography：study of mathematical techniques used to achieve information security objectives when facing adversarial attacks ### Security Objectives - CIA triad : Three main objectives of information security - confidentiality - integrity - availability - Two additional security objectives: - authentication - accountability #### Confidentiality Confidentiality extends to both ==data confidentiality== and ==user confidentiality (or user privacy)== ##### Data Confidentiality - ==Shield data== from all unauthorized parties **is achieved through data encryption** - Shield data from unauthorized parties who are **differentiated on the basis of access rights** :::spoiler Example 1. A patient’s hospital records will contain different types of data, such as identifying information, test results, diagnoses, treatments, prescriptions and medical insurance coverage 2. Hospital employees should only have access to the records on a need-to-know basis 3. The records management system will therefore need an authorization protocol that gives users different levels of access based on their identity ::: ##### User Privacy There is reason to expect that a user remains **unidentifiable** - ==Ensure untraceability== : Conceal the routes of goods, money, data and other operations, so that an observer **cannot follow the trail to deduce identifying information about a user** - ==Ensure unlinkability== : **Conceal the connection between multiple pieces of data from the same user, so that they appear unrelated to an observer** > Largely remains **an area of theoretical concern in academia** but is also gaining more attention as big data, Internet-of-Things (IoT) and the usage of blockchain become more pervasive #### Integrity ==Data integrity== is concerned with **ensuring that data are not tampered**. Though often conflated with confidentiality, **integrity is a distinct objective**. - False assumption 1. Encryption is sufficient to ensure data integrity - Since encryption conceals the real content of a message, the content cannot be modified in any meaningful way 2. Instead, it is possible to manipulate a piece of encrypted data in ways that simply invalidate its content - **Recipient has no idea** if the encrypted data are received with errors or were maliciously modified - So, need tools specific to integrity protection to secure data from unauthorized modifications - Data integrity is achieved through the use of - hash function - message authentication codes - digital signature #### Availability Data and online services are of no use if they are not available when authorized users need them Is closely associated with a malicious activity known as a Denial-of-Service (DoS) attack In IoT networks, a DoS attack has evolved to a Distributed DoS (DDoS) attack :::spoiler DDoS - An attacker first injects malicious software (or malware) into any vulnerable devices connected to the Internet - Once infected with malware, devices become bots that can be remotely controlled to send bogus requests to the target server ::: Availability is usually addressed through good system design, such as - intrusion detection (or) - intrusion prevention systems #### Authentication Authentication extends to both **data authentication** and **user authentication** ##### Data authentication - Verify that the **origin of a piece** of data is what it claims to be - Message authentication codes - digital signatures ##### User authentication - Verify that a user is **whom** they claim to be - Authentication factors to a login system, based on - what you know : username and password - what you have : access card - who you are : biometric information - Message authentication codes and digital signature, where users are required to demonstrate something they know - the secret/private key #### Accountability - Thwart a user from falsely denying that they have sent or received a particular communication, Similar to the concept of non-repudiation - Accountability goes a step further by creating a digital artefact that **makes it impossible for the sender to deny having sent the data** - Draw an undeniable link between the data and the users who have processed it, ensuring accountability for actions performed on the data - Digital signature ``` mermaid graph TD; SecurityObjectives-->CIA_triad_and_extension; CIA_triad_and_extension-->Confidentiality; CIA_triad_and_extension-->integrity; CIA_triad_and_extension-->availability; CIA_triad_and_extension---->authentication; CIA_triad_and_extension---->accountability; Confidentiality-->dataConfidentiality; Confidentiality-->userPrivacy dataConfidentiality-->dataShield userPrivacy-->untracability userPrivacy-->unlinkability integrity-->dataIntegrity dataIntegrity--->hashFuntion authentication-->dataAuthentication; authentication-->userAuthentication; dataAuthentication-->messageAuthenticationCodes; dataAuthentication-->digitalSignature; userAuthentication-->messageAuthenticationCodes; userAuthentication-->digitalSignature; digitalSignature-->hashFuntion; accountability-->digitalSignature; ``` --- ### Attack #### Active attack The attacker performs aggressive and often blatant actions with the aim of affecting or altering a system’s operations :::spoiler E.g. - e-mail phishing, modifying data in transit between users, mounting a DDoS attack - mounting a brute-force attack (trial and error) to guess a username and password combination ::: - **easy to detect** - **difficult to defend** - By the time an attack is detected, the damage **may already have been done or underway** #### Passive attack - The attacker uses **covert methods** to tap into a network and eavesdrop on or record communications - The goal is to gain access to a system or steal information surreptitiously :::spoiler E.g. - surveillance - man-in-the-middle attacks - keystroke logging ::: - **difficult to detect** - **easy to prevent** - One of the most straightforward methods is to encrypt communication between users --- ### Cryptography - A cryptographic scheme is an overall description of how a security system works - ==Schemes consist of one or more algorithms==, which are sets of instructions defining a sequence of operations to be carried out by a computer :::spoiler E.g. an encryption scheme may consist of - a secret key generation algorithm - an encryption algorithm - a decryption algorithm ::: - A function is a process or relation between values that takes some input and produces some output - almost all cryptographic techniques can be broken given **enough time and computing power** - A cryptographic scheme is computationally secure if no efficient attacker is able to break it in a reasonable amount of time - We can think of computational security as a practical ==balance between security and efficiency== - All computationally secure cryptographic schemes rely on the ==provision of pseudorandomness== - The distribution of a string should satisfy statistical tests of randomness, even though it is in fact **not truly random**, E.g., Ciphertexts - Pseudorandomness prevents a ciphertext from inadvertently disclosing information about the statistical distributions of the secret key or the original message - An attacker **cannot distinguish** one ciphertext from another --- ### BlockChain - Bitcoin is a peer-to-peer payment system, and “a chain of blocks” as the structure **for recording** - Bitcoin transactions in an **immutable** and **transparent manner** - Before Bitcoin, without a trusted central authority (such as a bank) to oversee transactions, there was no practical way to thwart three possible acts by malicious users: - Spending without authorization (i.e., without ownership of the account) - Spending without having enough balance - Double-spending :::spoiler i.e. sending the same amount of digital currencies to more than one receiver without deduction in the account balance ::: - Bitcoin blockchain was the first sound design of an open and secure **Distributed Ledger Technology (DLT)** that thwarts these three malicious :::info #### Distributed Ledger Technology (DLT) - DLT is a method of record-keeping whereby every person involved in the record-keeping possesses a copy of the ledger, Whenever a new record is created, it is updated in each and every user’s ledger - DLT is straightforward to implement in a trusted setting, which assumes either a trusted central authority with no self-interest, or that all users will behave honestly by: - Not sending bogus or meaningless transactions that will consume unnecessary computation and storage resources - Updating the ledger in an honest and timely manner whenever a new transaction comes in - Not making unauthorized deletions, insertions, or modifications to transaction details on the ledger - Including only valid transactions on the ledger and rejecting all invalid transactions - Cryptography records data on the transactions in a secure manner and ensures that it is costly for a user to cheat or perform unauthorized actions - Economic incentives reward honest and trustworthy users who help to maintain the integrity of the ledger ::: --- ### Cryptographic Hash Function Hash function : a method of mapping input of arbitrary length onto an output of fixed length - We call this output a hash value, or “hash” or “message digest” - A hash function, denoted as Hash(), takes as input a message m of arbitrary length and produces a hash value of fixed length l - $H = Hash(m)$ - Also write this process as - $Hash() : \{0,1\}^{∗} \rightarrow \{0,1\}^𝑙$ - Cryptographic hash functions can be used for - file integrity protection - generation of one-time passwords - digital signature computation, and so on :::warning - A blockchain is an immutable ledger - This immutability is a result of the **use of distributed ledger technology coupled with hash functions** - Do not allow any modification to a block and its transactions to go undetected ::: #### Properties of Hash Function ```mermaid graph LR p[Properties of Hash Function]-->Determinstic p-->v(Variable-length input, fixed-length output) p-->a(Avalanche effect) p-->pi1(Pre-image resistance) p-->pi2(Second pre-image resistance) p-->c(Collision resistance) ``` ##### Deterministic Same input always produces the same output ##### Variable-length input, fixed-length output Give an input of any length, Hash() always produces output of a fixed length ##### Avalanche effect A single bit flip in the input results in at least 50% bits flips in the hash value ##### Pre-image resistance Given only a hash value, it is computationally infeasible to find what the input is ##### Second pre-image resistance Give an input and its hash value, it is computationally infeasible to find another different input that produces the same hash value ##### Collision resistance It is computationally infeasible to find two different inputs that produce the same hash value :::warning - Pre-image resistance(or ine-wayness) Given h=Hash($m_1$), it is infeasible to find out the value of $m_1$ without trying on overage $2^l$-1 times, where l is the length of the hash value in bits - Second pre-image resistance Given $h_1$=Hash($m_1$), it is computationally infeasible to find another message $m_2$, where $m_2$\neq$m_1$ but Hash($m_2$)=Hash($m_1$) - Collision resistance It is computationally infeasible for an attacker to find two distinct inputs $m_3$ and $m_4$m such that Hash($m_3$)=Hash($m_4$) ::: - Many attacks against hash functions exploit the mathematical underpinnings of collision resistance based on the “birthday problem” :::info ##### Birthday problem - We are tasked to find two people who have the same birthday out of a roomful of people - Assuming that all 365 days in a year are equally probable for a birthday (and ignoring leap years), we need at most only 23 people in the room to have a 50% probability of finding two people who share a birthday - **It may seem counterintuitive that a relatively small set produces such a high probability of collision** ::: - The length of the hash value is also closely related to security #### Summary of "Cryptography Hash Function" section - Hash functions form the basis of a myriad of cryptographic techniques,including - message authentication codes - digital signatures - one-time passwords - checksums - Hash function is also used to protect the integrity of transactions and blocks - This is built upon the distributed ledger technology --- ### Merkle Hash Tree ![Merkle Tree](https://imgur.com/J0dBGtP.jpg) - The construction of a Merkle Hash Tree allows **efficient and secure verification of data** - ==Any changes in the data “P”, “Q”, “R” or “S”(leaves) would always be reflected in the Hash_Root== - If the Merkle Root, Hash_Root, is published and immutable to changes, then one can verify whether the data “P” is included in the Merkle Hash Tree by asking the Prover to supply proofs consisting of {ℎ 𝑄𝑄, ℎ 2𝐵𝐵} - The tuple {$ℎ_𝑄$, $ℎ2_𝐵$} is called the Merkle Path for data "P" - The Merkle Path for data “P” together with the hash of “P” is then sufficient for the Verifier to reconstruct its own Merkle Hash Tree and compare the resulting Merkle Root with the immutable Hash_Root - If the hash values match, then the Verifier can be ascertained that the data “P” is included in the Merkle Hash Tree - If “P” is not included in the Merkle Hash Tree but the Prover lies that it is, the Prover needs to find a **collision** in the Merkle Root hash ![Merkle Tree](https://imgur.com/J0dBGtP.jpg) Suppose the Verifier checks if data “N” is included in the Merkle Hash Tree shown in Figure 1.5 To lie that “N” is in the Merkle Hash Tree (instead of “P”), the Prover must supply proofs that are in **one** of the following forms: 1. {$h_1$, $h_{2B}$}, where $h_1$ is a hash value that will result in the same $h_{2A}$ when computed with $h_{N}$, - i.e., Hash($h_{N}$ ‖ $h_{1}$) is the same as Hash($h_{P}$ ‖ $h_{Q}$) 2. {$h_{Q}$, $h_{2}$}, where $h_{2}$ is a hash value that will result in the same Hash_Root when computed with the Hash($h_{N}$ ‖ $h_{Q}$), i.e., Hash(Hash($h_{N}$ ‖ $h_{Q}$)‖ $h_{2}$) is the same as Hash($h_{2A}$ ‖ $h_{2B}$) 3. {$h_{3}$, $h_{4}$}, where $h_{3}$ and $h_{4}$ are both hash values that when combined with $h_{N}$ will result in the same Hash_Root that has been published and immutable - To supply either one of the proofs above, the Prover will have to find a collision in the Merkle Root (Hash_Root), **an act which is computationally infeasible due to the security properties of a hash function** ### Hash Function in the Bitcoin Blockchain - [But how does bitcoin actually work?](https://www.youtube.com/watch?v=bBC-nXj3Ng4&t=39s) ![Bitcoin block header structure](https://imgur.com/UG9viGX.jpg) ![after hash](https://imgur.com/bQHEffL.jpg) - In the ***i*** th block, denoted as $𝐵_𝑖$ , there is a list of transactions - These transactions are integrity-protected using a Merkle Hash Tree, and the Merkle Root is part of block $𝐵_𝑖$’s header - the ==proof-of-work (PoW) consensus algorithm== in Bitcoin requires miners to vary the 32-bit nonce and iteratively calculate the block header hash until the resulting hash satisfies the difficulty target - The Bitcoin PoW’s difficulty target is set such that a valid block header hash can only be found in, on average, **ten minutes** - Without loss of generality, we refer to the block header hash that satisfies the difficulty target as the **valid block header hash** - The transaction would then depend on which block is accepted by at least **51%** of the users - The block which is accepted by majority users and therefore belongs in the longer chain will be considered as the final state of the ledger - The discarded block is commonly called a stale block - In Bitcoin, transaction finality is characterized by a concept called ==the six-block confirmation== - A transaction will become **computationally difficult to reverse when it is buried at least six blocks deep in the blockchain** --- ### Digital Signature - Digital signature achieves - user accountability - data authentication - integrity protection - In cryptography, a digital signature is a **pseudorandom string** that is created via arithmetic computation and secured based on a computationally hard problem - Signing requires - signer’s message - private key - Verifying requires the verifier to perform the arithmetic comparison using - signer’s public key - message - digital signature ![digital signature flow](https://imgur.com/vExPZTb.jpg) - The correctness requirement of a digital signature is as follows - Given a key pair (sk, pk) and the signature $Sig_m$ $\leftarrow$ $Sig_{sk}$ , the verification algorithm $Verify_{pk}$($Sig_m$ , $m$) will always output 1 - In other words, a digital signature produced using the private key sk corresponding to the public key pk **will always be successfully verified** using the public key pk [brief introduction to Digital Signature](https://www.youtube.com/watch?v=uS1ZIAsvT5w) --- ### RSA Digital Signature (Textbook) [breif introduction to RSA](https://www.youtube.com/watch?v=D_kMadCtKp8) :::success - Key Generation: - On input of a security parameter n, the KeyGen.RSA($1^𝑛$) function outputs a prime p and a prime q of n bits - Computes N = pq - An integer e that satisfies the relation gcd(e, φ(N)) = 1 - An integer d that satisfies the relation ed = 1 mod φ(N) $(sk=d, pk=<N,e>)$ $\leftarrow$ $KeyGen.RSA$$($$1^n$$)$ > Euler functionφ gcd(m,n)=1 $\rightarrow$ φ(mn) = φ(m)φ(n) = (m-1)(n-1) - Signing: - Sign.RSA() takes as input the integer d (the private signing key sk) and a message to be signed m $Sig^e_{m'}$ $mod N = m'$ - The message m is less than the value of N for the verification to succeed - Verification: - Verify.RSA() takes as input the tuple〈$Sig_{m'}$, $m'$〉and the integer e (the public verifying key pk), and produces 1 (“true”) if and only if $Sig^e_{m'}$ $mod N = m'$ - The correctness requirement of the RSA digital signature scheme is $Sig^e_{m} mod N = (m^d)^{e} mod N = m^{1+kφ(n)} mod N$ - The security of the RSA digital signature is based on the **RSA factorization problem** - By referring to the Key Generation algorithm, the RSA factorization problemstates that **given a very large number N = pq, where p and q are both large prime numbers, there is no known efficient method to find out its two prime factors p and q** ::: - The reason that this RSA digital signature scheme in this section is called a “textbook” scheme is because it is insecure for several reasons : - In one form of attack known as a *no-message attack*, an attacker can arbitrarily choose a public key e of a legitimate user, generate a pseudorandom value as the signature $Sig$, compute $m = Sig^{e} mod N$ and produce the tuple$〈Sig, m〉$ as a supposedly valid signature of the user on the message $m$ - One solution to the vulnerability is to hash the message before signing it - In this case, the security of the solution now relies on the security of the underlying hash function: $Sig_m=Hash(m)^{d} mod N$ - Actual implementation of the RSA digital signature is the **RSA Probabilistic Signature Scheme (PSS), which introduces a certain amount of pseudo randomness to the message before signing it, so that a different signature will be generated each time** --- ### Flow of Block Publish ```mermaid graph TD t[trader]-->pi(pay info.)-->ds(digital signature) t-->|public key|ds ds-->h[hash]-->bc(boardcast) m[miner]------>pb(choose the parent-block)-->in(into new-block) bc-->mr(merkle root) bc1(boardcast1)-->mr bc2(boardcast2)-->mr bcN(boardcastN)-->mr mr-->in in-->h1(hash ?) m-->n(nonce) n-->in h1-->d{meet the diffculty target} d-->|YES|p[publish] d-->|NO|n dt(diffculty target)-->in ts(timestamp)-->in style t fill:#8db,stroke:#333,stroke-width:2px style m fill:#8db,stroke:#333,stroke-width:2px style p fill:#8db,stroke:#333,stroke-width:2px ``` --- ### Elloptic Curve Cryptography - ==Elliptic Curve Digital Signature Algorithm (ECDSA)== **is adopted in Bitcoin and widely used in the blockchain system** - It is the elliptic curve analogue of the Digital Signature Algorithm (DSA), based on the form of cryptography that is called the Elliptic Curve Cryptography (ECC) - ECDSA can be considered as an alternative, with **enhanced security, of the first-generation public key cryptography methods such as RSA** Aspect|ECC|RSA| -|-|-| Time|ECC takes full exponential time|RSAtakes sub exponential time Security and key size|Same level of security with smaller key sizes|Same level of security with larger key sizes| Data size|Smaller|Larger| Encrypted message|Function of key size and data size; encrypted message is smaller in ECC|Function of key size and data size; encrypted message is larger in RSA Computational power|Smaller|Larger > $Source$: Khalique $et al.$(2010). - ECC is **more secure** than the RSA - The probability of the ECC algorithm being solved or the risks of ECC being broken are much lower than that of the RSA - Using ECDSA will provide the same level of security as RSA but with smaller keys (160-bit key length in ECC compared with 1024-bit key length in RSA) - faster signature generation - less data for the network - less omputing power needed --- ### Pay to Public Key Hash (P2PKH) - The method called Pay to Public Key Hash(P2PKH) is how hash and digital signature are used in Bitcoin ![1.16](https://imgur.com/SgB7HsP.jpg) - In Figure 1.16, each transaction may contain - More than one input ***(e.g., 1 btc and 2 btc as inputs to make up for a 3-btc spend)*** - More than one output ***(e.g., 1 btc as input and 0.4 btc, 0.6 btc to more than one recipients)*** - Multiple inputs and multiple outputs - **Each transaction will also have a ==unique transaction hash value== based on its content** - The hash value is recorded in the block Merkle Hash Tree #### Example ![transaction flow](https://imgur.com/PMZ3tG0.jpg) ##### R1 ![R1](https://imgur.com/I48mIAf.jpg) ##### R2 ![R2](https://imgur.com/sjJuDwx.jpg) ##### R3 ![R3](https://imgur.com/eAa0c8p.jpg) ##### R4 ![R4](https://imgur.com/9Arn79g.jpg) :::danger When Bob intends to spend the 5 btc he has gotten from Alice, he **must specify the source** of the 5 btc in the “**prev_out**” field and provide his proof of ownership of the 5 btc **by stating his public key and digital signature on this transaction** ::: - In order for miners to validate whether Bob has ownership of the 5 btc, they will do the following checks: 1. Extract Bob’s public key from the “scriptSig” field in the Bob-to-Charlie transaction 2. Calculate the hash of Bob’s public key obtained in Step 1 3. Find the transaction specified in the “prev_out” field in the Bob-to-Charlie transaction, *i.e., the Alice-to-Bob transaction* 4. Verify that the hash calculated in Step 2 is equal to the recipient’s public key hash specified in the “scriptPubKey” field of the Alice-to-Bob transaction 5. If they are equal, use the public key obtained in Step1 to verify Bob’s digital signature in the “scriptSig” field for the Bob-to-Charlie transaction 6. If verification succeeds, conclude that Bob has ownership of the 5 btc and thus can spend the 5 btc 7. Add the Bob-to-Charlie transaction to a block, and run the PoW algorithm :::spoiler *Notice* - The fourth point to note in this process is that Bob’s public key is not disclosed until he creates the Bob-to-Charlie transaction and announces his intention to spend the 5 btc in his wallet - The usage of public key hash to specify the receiver utilizes properties of the hash function - An attacker could not pretend to be Bob because the public key hash reveals nothing about the pre-image - Only Bob can provide the correct public key that hashes to the public key hash specified by Alice - Finally, once the miners verified that the hash of Bob’s public key is equal to the public key hash specified by Alice, the miners will use Bob’s public key to verify his digital signature on the Bob-to-Charlie transaction - When the verification succeeds, miners conclude that Bob has ownership of the 5btc in the walletThe relationship between a user’s private key, public key and wallet address - For security purposes, private keys are truly random strings - From the private key, a user’s public key is calculated using a one-way(irreversible) and predetermined manner - The public key is then hashed and encoded to form the user’s bitcoin wallet address - As such, the conclusion that “Bob has ownership of the wallet containing the 5 btc if his digital signature on the transaction can be verified using his public key” is based on the fact that Bob can only provide the (correct) public key if and only if he has the private key to create the digital signature - Consequently, only Bob can give Alice the (correct) public key hash - The act of specifying the receiver’s public key hash in a bitcoin transaction is a mechanism called “Pay to Public Key Hash (P2PKH)” ::: ### Public Key Infrastructrue :::info - In a digital signature, the signature and message are verified using the sender’s *(i.e., signer)* public key to prove that the message originates from the signer - A public key is widely disseminated - it is a pseudorandom number that by itself does not contain any identifiable information about its owner - A potential security loophole — ==how can a verifier be sure that the public key belongs to the right signer?== - Without a method of verification, an attacker could exploit this loophole by **intercepting the public key and replacing it with his own**, thereby compromising the objective of data **authentication** and **accountability** in the digital signature - There is a need for us to ensure that a public key belongs to a specific user - **Public key infrastructure (PKI)** solves this problem by providing a method for **associating** a public key with a specific user in a secure manner ::: --- - The PKI framework comprises - certificates - certification authorities - certificate revocation ![PKI](https://imgur.com/JrCvCp0.jpg) #### Certificate A certificate is a digital document that serves as proof that a particular public key belongs to a specific user (the subject) - Include - version number - serial number - type of digital signature scheme used - issuer’s name - validity period - subject’s name - subject’s public key - issuer’s signature - Aside from users, certificates can also state a website’s information *Ex: Facebook’s certificate* #### Certification Authority To obtain a digital certificate, we must approach a third-party certification authority (CA,also known as the issuer) - Certification takes the form of the CA’s digital signature appended at the end of the certificate - CAs are usually organized according to geographical distribution - *For example, Netrust is a CA that covers the Southeast Asia region and is the only accredited CA in Singapore* - We know which CAs to trust as one CA vouches for another - This takes the form of a hierarchical relationship, with a root CA at the top vouching for intermediate CAs that issue certificates to end users #### Certificate Revocation 　　A list of such root CAs is hard-coded into our browsers, Whenever we install a browser and visit a website, our browsers trace the website’s certification path back to the root CA *Examples of root CAs include Comodo, Symantec, GoDaddy, GlobalSign, DigiCert and Verizon* - ==Certificates are revoked== - **when they expire** - **or if the user’s credentials change** - **or if the user’s private key is lost or compromised** - **To avoid inadvertently sending confidential messages to a compromised user**, we **maintain repositories of revoked certificates** to perform checks on the status of a certificate - One way to do this is for **root CAs to maintain a certificate revocation list (CRL) that users can check against** - However, over time, the CRL may grow to such a magnitude that it becomes either difficult to maintain or inefficient to perform a search - Alternatively, **the online certificate status protocol (OCSP) uses IP to obtain the status of a certificate** - This is a much more efficient method that is supported in most browsers used today #### Summary of PKI - In the Bitcoin blockchain, all transactions are linked to public key hashes, which on their own do not provide any identifying information pertaining to the wallet’s owner - That said, it can be possible to prove a user’s ownership of a wallet given certain information, such as her possession of a private key corresponding to the public wallet address - So we need the PKI ### Privacy User privacy in a blockchain rests on the untraceability and unlinkability of transactions :::info - Untraceability means we should be able to conceal the details of a transaction so that an observer cannot follow the trail - Unlinkability, in general, means that two events occurring under the observation of an attacker should appear unrelated to the observer ::: - blockchain technology is pseudonymous, as opposed to anonymous - By simply observing the blockchain, there is no identifying information linked to the wallet addresses or transactions - **However, we are able to trace the source of every transaction, and therefore, able to obtain a graphical visualization of transaction paths** - Through prior knowledge or social engineering, **it is possible to link an identity to a wallet address** - ==We need security schemes in place to protect user privacy== ![before](https://imgur.com/rGhZKNn.jpg) ![after](https://imgur.com/UdTlZT4.jpg) - An observer can deduce, with certain probability, that addresses 2, 3 and 4 belong to the same user - In fact, such an attack is similar to a dusting attack, :::info Dusting attack - an attacker sends users tiny amounts of cryptocurrencies to their wallets - After that, the attacker performs analysis of these wallet addresses in an attempt to identify which ones belong to the same user ::: - Understanding user privacy as a form of confidentiality might lead us to consider encryption as a solution - The two main types of data in a Bitcoin transaction are **the transaction amount** and **the addresses of the sender and receiver** - Immediately, some problems are evident with encrypting this information - Encryption alone does not allow blockchain to function properly or securely as a decentralized P2P payment system, and encryption alone does not ensure untraceability and unlinkability - *For example, encrypting transaction data or wallet addresses does not provide untraceability **if the same wallet is reused*** - Furthermore, encrypting transaction data and wallet addresses **without providing additional information will impede miners in the effort of validating transactions** *(e.g., check for double-spending and proof of ownership)* - Blockchain developers have managed to develop algorithms that ensure user privacy while addressing these concerns - Some possible methods: - ==CoinJoin== - Stealth addresses - Ring signature - Ring Confidential Transaction (RingCT) in Monero - ==zero-knowledge proofs== ### CoinJoin ### Zero-knowledge proofs ## 2. Consensus for Blockchain and Distributed Ledger Technologies These rules are the consensus algorithms programmed into the blockchain protocol A good consensus protocol needs to maintain a consistent ledger - It should keep the network up to date, fix errors and ignore malicious nodes In a blockchain network, it also needs to ensure that the transactions put on the ledger are valid For consensus algorithms to achieve the objective to maintain a valid blockchain ledger, it needs to be able to do the following: - If bad actors try to broadcast invalid transactions, they should be ignored by the good nodes and not included in the ledger - When bad nodes try to mine a block with invalid or fraudulent transactions, the majority of the network should agree not mine on top of it To achieve this, the blockchain nodes need to do the following: - Be able to keep track of historical transactions - Authenticate and validate current broadcasted transactions - Reach an agreement on what goes in the ledger with the other nodes - Receive appropriate incentives to “do the right thing” ### Fault Tolerance The ability of a system to operate in the event of failure of one or some of its components #### network fault tolerance - A network fails when it loses connectivity fully or partially - In the event of partial loss in connectivity, segregated parts of the network should be able to continue operating and sync up again when the network is reconnected - The linear nature of blockchain networks does not allow this to happen and hard forks will result - Structures like **Directed Acyclic Graph (DAG)** address this and will be discussed later #### node failure - Byzantine faults include the following: - Omission failures (e.g., crash failures, failing to receive a request, or failing to send a response) - Commission failures (e.g., processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request) - 𝑛 ≥ 3𝑚 + 1 - where 𝑛𝑛 is the number of generals in the system and 𝑚𝑚 is the number of traitors (faults) In general, there are two fundamental stages to a BFT system: 1. Agreement - Nodes agree on what should be done - Each consensus algorithm has its way for the majority of the nodes to reach an agreement - Once the outcome is agreed upon, it moves on to the execution stage and puts the outcome on the ledger 2. Execution #### Consensus for Trustless Blockchains Some general guidelines to consider are as follows: 1. Doing the right thing gets rewarded 2. Rewards from continuing to do the right thing should outweigh the potential net benefits from doing bad things — the costs of doing bad things should be high - Proof of Work - Proof of Stake - Delegated Proof of Stake - Proof of Weight - Proof of Believability (PoB) - Proof of Space (PoSpace) - Proof of Importance (PoI) - Proof of Burn #### Consensus for Trusted Blockchains In trusted blockchains, all nodes are known, and thus malicious acts can be identified Since the nodes have been vetted before being allowed on the network, they are generally assumed to be trusted to validate transactions Thus, consensus for trusted blockchains is ==focused on preventing crash faults rather than Byzantine faults in general== - Proof of Authority (PoA) - Proof of Reputation - Proof of Elapsed Time (PoET) - Practical Byzantine Fault Tolerance (PBFT) #### Hybrid Blockchain Networks - Delayed Proof of Work (dPoW) - Directed Acyclic Graphs (DAGs) - Tangle - Hashgraph #### Summary - Consensus algorithms are important to ensure consistency and accuracy on a blockchain or distributed ledger - Developing an understanding of how the types of consensus algorithms work would give us an appreciation of blockchain-based applications, assets, and investments, and allow us to make informed decisions - The choice of consensus depends on what the network implementer wants to achieve - A decentralized public system that needs to be self-governing - An enterprise consortium blockchain that needs to handle sensitive transactions - With the multitude of blockchain solutions, any blockchain network may also be required to communicate with several others - Interoperability will become a key consideration ## 3. Introduction to Blockchain Smart Contracts ## Case Study ```mermaid graph TD Voting-->C(Conventional) C-->paper_polling C-->mechanical_ddevice Voting-->D(Digital) D-->E-voting D-->I-voting ```