Query 1 (Heartbeat) ================== ### Target Throughput: **32,000** ### Q1 Checkpoint due `Sunday, October 18` ### Q1 Final due `Sunday, October 25` START-PANEL:"info" #### Learning Objectives This project will encompass the following learning objectives: 1. Compare web frameworks given a set of requirements 2. Explore tradeoff between EC2 cost and performance 3. Explore and compare the differences between Application and Network load balancers 4. Identify performance bottlenecks and optimize computationally expensive tasks 5. Develop automated deployment with Terraform 6. Tune web framework parameters END-PANEL Query 1 is a ramp-up task which makes sure your team has a working web service. Your team should build your web-tier and respond to incoming requests. These incoming requests are simple, and do not need any effort to design and deploy a database. Query 1 is generally used as a heartbeat mechanism to test if your web-tier is working. Besides, this can also be used to test the performance and whether your server can handle varied traffic load. In Q1, you are required to verify and update a data structure. Gently walk through the writeup to get familiarized with the specific requirements. START-PANEL:"danger" You should have at least one 600-second submission on TPZ which reaches the target throughput to get full marks for this query. 1. We require you to carry out a performance comparison of different web frameworks in Query 1 in order to let you develop a deeper understanding of the pros and cons of web frameworks. 2. You should have a submission record with your implementation on at least two frameworks. 3. **You must write a terraform script to auto-deploy your web services and submit it with your code.** The script should automatically install packages needed to run your server and do all the needed configurations before launching your server. Please note down the distribution you choose in the comment of the script. 4. You can assume that we will run your script along with your code, and since we need you to submit your code, you should compile your submitted code in your script. Don't directly submit an executable and run it in your script! 5. You need to compare different deployment configurations (instance types & number). 6. **Read the report template carefully** before getting your hands dirty in the project so that you can have a better understanding of what you need to do and deliver in this phase. END-PANEL *** #### **Trust issues** Money, colorful pieces of paper for dining, housing, and paying for classes that have deadlines every weekend. When we have stacked up more of them, we store them into banks where they become 0s and 1s and you can pay other people without ending up with a pocket of rattling change. Life is good. But, you worry about the worst-case scenarios: the time will come when banks make bad investments and burn up all their clients' money. Worst yet, just when someone is glad that they have saved some cash under the mattress, this unfortunate person's government could collapse and render the cash worthless because after all, they are pieces of paper. #### **What? Go back to the stone age, then?** Why not? Not everyone accepts the value on a banknote imposed by governments, but a gold bar is recognized everywhere, right? No, you have a vision. You want to send and receive money anywhere, anytime, without a single entity to decide how much you can send and what fees they are going to charge you. You don't even want anyone to know who you are. You want to be free and anonymous. #### **Go decentralized** Completely ignoring the fact that an unknown author Satoshi Nakamoto has done exactly the same [thing](https://bitcoin.org/bitcoin.pdf) more than a decade ago, your brilliant mind came up with an idea about a decentralized digital currency founded on the ubiquity of the Internet, modern cryptography and game theory. Instead of relying on or be subject to the authority of a single entity, this currency system should exist in every corner of the Internet, so that it is almost impossible to be shut down by some grumpy corporations or government agencies. For this awesome idea, you gave it a totally original and creative name: **CloudCoin**, or **CC**. #### **Distributed ledger** When transferring your ever-increasing tuition to CMU, the banks verify your identities, check your deposits, add some transaction fee and send the funds to CMU's account. The most important data a bank owns are your identity, how much money you have in your pocket, and your transaction history. In the **decentralized** CloudCoin, every participant in the network should be able to verify the validity of transactions. Therefore, you decided that every participant will have the same copy of the ledger (so that the system is immune to node failure). In the ledger, every transaction made since the beginning of time is stored so that when a new transaction is made, it is possible to check if the sender has accumulated enough deposits to carry out the new transaction by traversing through all historical transactions. #### **Don't you touch my money!** When someone makes a new transaction, the related information (including the sender, the recipient, the amount of money to transfer, etc) is broadcast to the entire network since every node needs to update its ledger. But hey, why don't I just swap the sender-recipient pair, broadcast this transaction to get some money out of CMU's pocket? Apparently, there is a solution to this issue: every transaction must be **digitally signed** by the sender to prove that the transaction is initiated upon the sender's consent. Asymmetric cryptography is born for this purpose of creating digital signatures. Given a pair of private and public keys and a message (plaintext), below two sets of encrypt/decrypt actions are supported: ciphertext = encrypt(plaintext, key=private_key) plaintext = decrypt(ciphertext, key=public_key) and ciphertext = encrypt(plaintext, key=public_key) plaintext = decrypt(ciphertext, key=private_key) That is, a message encrypted with one key can be decrypted by the other. For example, If Alice encrypts a message using her private key and sends it to Bob, Bob can decrypt the message with Alice's public key. Also, Bob can be absolutely sure that the message is encrypted by Alice even if Alice later claims to have never sent it (to be a bit pedantic, we call this property: non-repudiation). Since you can't revoke the messages you encrypted with your private key, you need to keep your private key secret (maybe not too [secret](https://www.washingtonpost.com/business/2019/02/04/cryptocurrency-company-owes-customers-million-it-cant-repay-because-owner-died-with-only-password/?noredirect=on&utm_term=.143de76ccc5a)). #### **The accountant** As every transaction is broadcast to every node in the CloudCoin network, it becomes rather difficult to keep all nodes synchronized. So, you are thinking of a way to elect a node every once in a while, to act as the temporary accountant to perpetuate outstanding transactions into the ledger and all other nodes will then adopt this version of the ledger. For the honest work done by this accountant, new CloudCoins are rewarded. However, this is not like a Kubernetes cluster where nodes are civilized and orchestrated. When there is monetary incentives involved, you know you can no longer use naïve methods like “asking each node to sleep for a random period of time and choosing the one that wakes up first“ to elect an accountant. This is no election, but a competition. Somewhere in your memory, you found something called cryptographic hash functions that may come handy to solve this issue. It has some very nice properties, including but not limited to: 1. With `x`, computing `hash(x)` is fast. 2. For value `x`, `hash(x)` is unpredictable. Even a small change of `x` will result in completely different `hash(x')`. 3. Given `hash(x)`, there is no better way than trying every possible input if you want to find `x`. (Although strategies have been found to generate collisions for some older cryptographic hash functions like MD5 and SHA-1, attacking of more recent functions like SHA-256 is still as difficult as waking up for a morning class). For example, if you have a string "cloud", its SHA-256 hash value in hexadecimal is: 56681010b753e1abe52c449d0aab291b28f1808a3a91b6baeaa726883baad4b0 Can you find a string, when appended to `cloud`, that generates a hash value that starts with a `0`? After some trial-and-error, you found that `cloudisthebest!` has a hash value that starts with a `0`: 0f24aff93ac78d0e65b09d806295c636e08f0419ffc45f9dfa82a1918f0bb843 But what about starting with two 0s? Thirty 0s? The only way is to brute-force all different inputs and see if one of the results meets the objective. If you are lucky, you can find the target string quickly, if not, you may be stuck for a long time searching. Similarly, for CloudCoin, you set a collision target for all contestants and ask them to come up with a value that meets a hash target. You have tuned the difficulty of the target so that on average, in every 10 minutes, one node in the network will be able to generate a successful collision and broadcast its block of transactions to the network. Since generating this target requires a non-trivial amount of computation, this target string is also called **Proof of Work (PoW)**. In a nutshell, the node works really hard on a computational problem, makes a contribution to the bookkeeping of the ledger, and gets rewarded. This reward is the only way to generate new CloudCoins. In this sense, the accountant node looks like someone digging in a gold mine and put new gold into circulation. Therefore, you began to call these nodes **"miners"**, and the attempt to find a hash collision **"mining"**. #### **Buyer's remorse** Say, someone named Albert has bought a nice new headset. He made the transaction to the vendor, and after a few days his package arrived. While enjoying it, the sense of regret is crawling upon him: “It is too luxurious for me to spend hundreds of dollars. Can I just… keep the headset and get my money back?” If blocks in the current ledger (ordered by the time of creation) are B1, B2, B3, and assume Albert's transaction is logged in B2. Since we have established non-repudiation property of transactions, Albert's only hope is to remove the transaction from B2, borrow a nice computer to mine the block again, and broadcast this modified block back to the network. (Ironically, the electric bill to do the mining may be more expensive than the headset) Obviously, in your design, such a thing should never take place. In the CloudCoin protocol, when creating a block, the hash value of the previous block must be included to compute the hash of the current block. If Albert wants to change B2 and convince the network that this new B2 is valid, the new B2's hash value must be the same as the one B3 used to compute B3's hash. We are talking about the collision of the entire hash value, not just for some starting with 0s. With SHA-256 and today's computing power, our Sun would explode before you can get such a collision, seriously. ![chained structure](https://raw.githubusercontent.com/pluralsight/guides/master/images/8cd8b94f-d05f-41e8-a0f1-70853f390094.png) **Figure 1.1:** Example of the blockchain structure #### *The breaker of chains (Optional material)* Knowing he cannot just modify a block and get his money back, Albert went crazy: “I'll destroy your stupid chain!” Chances are, within a very short period of time, more than one miners can be successful to create a new block and make a broadcast. In this case, we will have conflicting chains (like a “fork”). Which chain of transactions in the ledger should the network adopt? CloudCoin exercises full democracy: the majority decides which chain is accepted. If a collection of nodes N1 works on top of new block B1, a larger collection of nodes N2 works on top of new block B2. Since there are more miners in N2, it is more likely that another new block is created by some node in N2. Now N2's chain becomes the longer chain; the new block then gets broadcast to the network hoping no one else is broadcasting a different block at the same time. According to the CloudCoin protocol, nodes will believe in the longest chain, since they represent a higher computing effort. Normally, when you create a transaction, you can wait until the block that contains your transaction is appended by some more blocks to be sure that your transaction is stored in the widely accepted chain. If you wait for 6 extra blocks to be appended, you need roughly 1 hour to confirm that your transaction is “delivered”. If Albert wants to tamper with the system and make his fake chain be the accepted one, he will need to collect a [huge](https://www.investopedia.com/terms/1/51-attack.asp) amount of resources to make sure that he can generate new blocks faster than the entire network combined. As more participants join the network, this becomes highly unlikely to happen, it is just not financially wise to do so. *** ### **Your Task** #### *"What? I thought this is just a reading assignment"* Talk is cheap. Now that you have the blueprint of the CloudCoin protocol. You need to **write a client** to be used by all nodes. As described in the dreadfully long introduction above, a node can participate in the network just to make transactions, or when feeling ambitious, also assume the position of a miner that creates new blocks and earn some CloudCoins for itself. In Query 1, we will be sending many requests to your client server to test its performance, correctness and stability. Each request consists of a simplified blockchain and several outstanding transactions. Your task is to verify if the blockchain is legal and add the new transactions into a new block that is to be appended to the blockchain. ![Request format](https://s3.amazonaws.com/cmucc-public/webcontent/twitter-phase-1/Q1RequestFormat.png) **Figure 1.2:** Request format The above figure shows the format of the requests. Each request is **zlib compressed and encoded by URL-safe Base64** (not default base64). The request itself is in JSON format, here's an example: { "chain": [ { "all_tx": [{ "recv": 895456882897, "amt": 500000000, "time": "1582520400000000000", "hash": "4b277860" }], "pow": "0", "id": 0, "hash": "07c98747", "target": "1" }, { "all_tx": [ { "sig": 1523500375459, "recv": 831361201829, "fee": 2408, "amt": 126848946, "time": "1582520454597521976", "send": 895456882897, "hash": "c0473abd" }, { "recv": 621452032379, "amt": 500000000, "time": "1582521002184738591", "hash": "ab56f1d8" } ], "pow": "202", "id": 1, "hash": "0055fd15", "target": "01" }, { "all_tx": [ { "sig": 829022340937, "recv": 905790126919, "fee": 78125, "amt": 4876921, "time": "1582521009246242025", "send": 831361201829, "hash": "46b61f8e" }, { "sig": 295281186908, "recv": 1097844002039, "fee": 0, "amt": 83725981, "time": "1582521016852310220", "send": 895456882897, "hash": "b6c1b10f" }, { "recv": 905790126919, "amt": 250000000, "time": "1582521603026667063", "hash": "b0750555" } ], "pow": "12", "id": 2, "hash": "00288a38", "target": "0a" } ], "new_target": "007", "new_tx": [ { "sig": 160392705122, "recv": 658672873303, "fee": 3536, "amt": 34263741, "time": "1582521636327155516", "send": 831361201829, "hash": "1fb48c71" }, { "recv": 895456882897, "amt": 34263741, "time": "1582521645744862608" } ] } In the request JSON, some important fields are: 1. `chain`: The chain is a list consisting of multiple chained blocks, each block has the following fields: * `id`: Integer ID starting from 0, increment by one for each block. * `all_tx`: A list of transactions logged into the block. * `pow`: Proof of work. The format will be discussed later. * `hash`: Hex string of the block hash. * `target`: Hex string of the hash target. 2. `all_tx`: Inside each block, there is a transaction list. Each standard transaction consists of the following fields. * `time`: A string representation of long integer UNIX timestamp in nanoseconds. Transactions are ordered in monotonically increasing timestamps. CloudCoin doesn't allow transactions with identical timestamps. * `send`: Sender's account as a long integer. CloudCoins are withdrawn from the sender's account. The account number is also the public key of the sender. * `recv`: Recipient's account as a long integer. CloudCoins are deposited into the recipient's account. The account number is also the public key of the recipient. * `amt`: Transaction amount, a non-negative integer specifying the amount sent from the sender to the recipient. * `fee`: Transaction fee, a non-negative integer specifying the amount of extra coins that the sender is willing to reward the miner. The transaction fee incentivizes miners to include the transaction into the new block. * `hash`: Hex string of the transaction hash. * `sig`: The RSA signature of the transaction hash. The format will be discussed later. Besides, at the very end of this transaction list, there will always be a reward transaction, which is added by the miner when creating a new block, to reward the miner for the mining effort. This transaction requires no sender's account, since the rewarded coins are newly "minted". The transaction fee and the signature are not present either. The amount of the reward is calculated from the block's ID, which will be explained later. 3. `new_tx`: A list of new transactions for you to process. The transactions are also ordered in monotonically increasing timestamp and must come later than any logged transactions. A new transaction is created either by the miner or by some other account. For a new transaction created by some other account, it has the same format as the standard transaction described above. For a new transaction created by the miner, only the timestamp, recipient and transaction amount are provided. 4. `new_target`: Hex string of the new hash target, which is the hash target of the new block to be created by the miner. #### **Specifications of the transaction hash and signature.** 1. For a transaction like the one below (not all fields are shown): { "send": 15213, "recv": 15619, "amt": 528491, "fee": 24601, "time": "1552533255926535897" } We define our own hash function `CCHash` which returns the string of the first **8** hex characters of the `SHA-256` result. e.g. SHA-256("ccisawesome!") = "c67183496ab12d39128de202de0622e750877089c31e0a08df052166b0e50e4c" CCHash("ccisawesome!") = "c6718349" (NOTE: always use lower case alphabets) The hash value of a transaction is computed as: CCHash("timestamp|sender|recipient|amount|fee") e.g. The hash value of the above transaction is: CCHash("1552533255926535897|15213|15619|528491|24601") = "0884c25f" Note that you should follow the order of the fields in the `|` delimited string input. For reward transactions, missing fields are replaced by empty strings (nothing in between two `|` characters). 2. An example RSA procedure works as follows: 1. We use a tuple of two integers `(n, e)` to denote public key; similarly, private key is denoted by `(n, d)`. Notice that `n` is shared. Assume that we have the public key `(91, 5)`, and the private key `(91, 29)`. We want to encrypt the message `M = 23` with the public key. 2. We can get the ciphertext `C = (M ^ e) % n`. Therefore `C = (23 ^ 5) % 91 = 4`. 3. When we want to decrypt `C`, we use the same procedure: `M = (C ^ d) % n`. We can recover `M = (4 ^ 29) % 91 = 23`. Your implementation of the RSA procedure will take the **integer value of the transaction hash** and your private key as input, and output a signature as a long integer. As a result, other CloudCoin users will be able to verify the authenticity of your transaction, by taking the signature and your public key as input, and compare the RSA output with the correct transaction hash. The sender's or recipient's account in transaction is exactly their `e` in public key. We set the value of `n` as a constant number `1561906343821`, which means all accounts in all requests are using it as the modulus number. Your team's account is `1097844002039`. Here's your team's key pair info: { "n": 1561906343821, "e": 1097844002039, "d": 343710770439 } (This is roughly how your .pem file works. You will be working with RSA keys in this project. Other than the encryption/decryption process, you do not need to worry about other details, e.g. how key pairs are generated. Interested readers can read related materials [here](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Key_generation). What you should beware of, is to make sure that you understand the relationship between signing/verifying and encryption/decryption.) #### **Specifications of the block hash and PoW.** 1. The hash of each block is generated in the following format: CCHash(SHA-256("block_id|previous_block_hash|tx1_hash|tx2_hash|tx3_hash...") + PoW) The SHA-256 function returns the full-length hex string of the SHA-256 hash. We do **not** require a `|` before between the SHA-256 result and the PoW string. \* For the first block, it has a previous block hash of `00000000`. 2. Mining is defined as trying to find a string `PoW` so that the above operation generates a block hash that is **lexicographically smaller** than the block's hash target. `PoW` can be any string that meets the goal. #### **Specifications of the rewards to miner.** When creating a new block, a miner will get rewards from two sources. 1. Reward transaction. The amount of reward is calculated from the block's ID, which starts at `500000000 CC` and is halved (and **round down**) for every two blocks, i.e., `500000000 CC` for blocks 0 and 1, `250000000 CC` for blocks 2 and 3, `125000000 CC` for blocks 4 and 5, and so on. 2. Transaction fees. The miner also receives the transaction fees of each standard transaction that is logged into the mined block. At some point in the future, the reward amount will eventually come down to zero, then miners will completely rely on the transaction fees for rewards. *** ### Request Format We will be sending HTTP GET requests to your web service endpoint in below format: http://<your_dns_endpoint>/q1?cc=<base64_encoded_request> For example, the request for the JSON shown earlier looks like this: /q1?cc=eJyFk9tum0AQht9lr7mYw85heZWqigBDbClxqxg1lSK_ewcCBddJuldoF3a--f7hLXXH5nRO9be31Dw9PYy_58eXvvuVai-SRd3Ji1WpeR5TLbCsKo2n5z7VCcVJCDJsK1Xp2FyOcZhbMnOFdP1epZ8_XmNrOj0dUg1_XwLrilu2OBibl8d-nG5N1-qG6HJ6TDUKcRCwBVepVkpGViRAp9gb-oCiDL4AI6lnL1nvgKc7TAiLaVS-9OfDvx0vfB1k46Y9zEjvNZUwxx1MbOU_ZhCA0OMGl4KbmaYVHfDgOzMEtLjBzQ2IDAeUvRv4WE60D0ScobCtbgqIFQgHBVc35kiyMGc3LYT3xIWyUg4g2dTcaF7z1VZx8H4GmimoCDmia5kieKdAKOY5JiSMrRiwIDgbSfF7BlSPtDFagq_jabXDFmHYxXPb9lyH5JN4FBhIVQ2Ut3haMAn1sosH13Rolw65N-w36TTzN-f-9WHbg2m45639NEfpQgaCRKsqFVcjN2bgxRQL69IEZ1K2fCdLWZkMgxf168BwaLN3hjtZH_zln9bJYjm7ksI0uNc_g4cQeA== ### Response Format You should be able to identify potential attacks to the blockchain. If you allow bad guys to profit, this entire system will lose its credibility. For example, what could go wrong if we have two identical transactions in the blockchain? If you find the request somewhat problematic, reply: TEAM_ID,TEAM_AWS_ACCOUNT_ID\nINVALID The second line must start with `INVALID`, but you can append some debug message after it for your own convenience, as long as the message does not start a new line. Of course, don't push yourself too hard to check for every piece of detail, we won't be giving you requests that misspell “hash” into “cash”. If you can get 100% correctness, statistically, you should have covered all our malicious cases. In addition, correctness is converted to integer for grading. Higher than 0.5 correctness will be rounded up. For example, 99.5% will be rounded up to 100%, but 99.49% will be rounded down to 99%. If the blockchain looks good, your web service will then **act as a miner that creates a new block that contains all the new transactions and a reward transaction**. For those new transactions with only timestamp, recipient and amount provided, they are transactions created by your team, so you need to fill in the other required fields to make this transaction valid. The money will come out of **your team's account**. When creating the new block, you need to make two assumptions. * Assume you are too thrifty to pay any transaction fee (meaning `fee = 0`, not empty). * For the timestamp of the reward transaction, assume your block is created exactly **10 minutes after** the previous block. You should append the new block to the chain in the request JSON, compress the JSON string with zlib and encode it by URL-safe Base64, and include the Base64 string in your response together with your team's info. TEAM_ID,TEAM_AWS_ACCOUNT_ID\nYourResponseInBase64 For the request shown in the JSON format above, one of the possible responses can be: TheLannisters,123400001234 eJyFk9tum0AUAP9ln3k4lz2X5VeqqgIMsaUkrWLUVor87z12IIDtuDwhFnZnZ5b31O2bw2uqv72n5vn5x_j3cvvWd79T7UWyqDt5sSo1L2OqBaarSuPhpU91QnESggzLlaq0b477GMwtmblCOn2v0q-ff-LRefSwSzV8vgTWFbdsMTA2b0_9eJ41naoN0fHwlGoU4iBgC65SzZSMrEiATvFs6AOKMvgEjKSevWS9AT7PYUJYTGPlY_-6u97xxNdBNm7a3QXpY00lzDEHE1v5jxkEIPSYwaXgYqZpRQfc-coMAU1ucHEDIsMOZe0G7suJ7QMRZyhss5sCYgXCQcHZjTmSTMzZTQvhLXGhrJQDSBY1G81zX20VB-8vQBcKKkKO6FrOCT4oEIp5jhMSxmYMmBCcjaT4LQOqR22MLcHjPK122CIMqzzbbV_WIfkijwIDqaqB8pKnBZNQL6s8ONehVR1yb9g3dZr7RzfWKWQgSDR7UXE1cmMGnrSwsE7EnEnZ8o0ZZWUyDDjUx3VwaLN3hksdjD-DYqOhFbG6-5tv6nzJkMVydiUF_2S4yjwfdJIWh66s8ly9-LBP5L_XB_uGuHhe9QmTUyBeBUIr0nabQGDx1ekff489yA== **Note:** It's OK to keep fields other than `chain` in your updated JSON, like `new_target` and `new_tx`, but removing them can help to reduce your response size. If you want to get a sense of what the updated JSON looks like, or to have a check whether your response can be correctly parsed by our grader, you can refer to the reference server util function at [http://reference.theproject.zone/utils/decode](http://reference.theproject.zone/utils/decode). *** Now you have earned some CloudCoins! Go get some [pizza!](https://www.businessinsider.com/bitcoin-pizza-10000-100-million-2017-11) [1]: https://en.wikipedia.org/wiki/QR_code [2]: http://www.swisseduc.ch/informatik/theoretische_informatik/qr_codes/docs/qr_standard.pdf [3]: https://s3.amazonaws.com/cmucc-public/f17/team-project-images/212px-QR_Code_Damaged.jpg [4]: https://s3.amazonaws.com/cmucc-public/f17/team-project-images/2_150_150DPI_ty_oerny_08_2011.jpg [5]: https://s3.amazonaws.com/cmucc-public/f17/team-project-images/Patterns.png [6]: https://s3.amazonaws.com/cmucc-public/f17/team-project-images/Detection.png [7]: https://s3.amazonaws.com/cmucc-public/f17/team-project-images/Version.jpg [8]: http://www.thonky.com/qr-code-tutorial/data-masking [9]: https://s3.amazonaws.com/cmucc-public/f18/team-project-images/Zigzag.png [10]: https://s3.amazonaws.com/cmucc-public/f17/team-project-images/Mask.png [11]: https://s3.amazonaws.com/cmucc-public/f18/team-project-images/logistic_map.png