HTTP and Web Programming === Topics Covered: HTTP, Client-Server Model, Web APIs ## What is HTTP? HTTP (Hypertext Transfer Protocol) is an **application layer** protocol designed for transferring information over the web between networked devices, and is primarily built on TCP (some recent versions use UDP, which will be discussed further later in the course). Like any protocol, it's a standard agreed upon by networked devices on the Internet. It has a specific message format, which any receiver and sender adheres to. ![image](https://hackmd.io/_uploads/BJVMsrCg0.png) ![image](https://hackmd.io/_uploads/SJg-GL0eR.png) From Professor Zhang's slide deck, this diagram is especially useful in visualizing the format. You can think of it as the headers providing the information about the client message (request) and how it should be handled, and the body contains some data the server can use. Generally speaking, the web works as an interaction between a client device and a server device. Both are just computers connected to a network, there's nothing special about a server. The client provides the server with a request, the server processes the request and (usually) does some action, then the server sends the client a response. The request and response are both done over HTTP. TCP ensures there is a streamed connection, while UDP is more along the lines of send and forget, so TCP makes more sense for now. #### HTTP Methods HTTP Methods specify the type of action the client wants from the server. Here are the main ones: - `GET` - Asks the server to retrieve some resource and send it over to the client. This is usually a `SELECT` operation against a database. - If the resource is not available, the server responds with 404 status code (more on this later). - `POST` - Asks the server to create a new entry. This could be like someone creating a new user profile on Instagram. - Along with the POST method header, you will normally send a payload inside the request body containing relevant information (username, hashed password, etc.). - This would correspond to a `INSERT` operation against a database. - `PATCH` - PATCH and PUT get confused often. PATCH contains information about how to modify an existing resource. - For example, if I wanted to change my profile picture on some social media platform, that should be a PATCH request containing the new profile picture or some encoding of how to change it (in the request body, of course). - Editing a message on Discord is another example. - This would be an `UPDATE` on a database. - `PUT` - Unlike PATCH, PUT contains information about the new object we want to add or modify to an existing item on the server. - For example, adding a reaction to a message on Discord would be a PUT. - We have a new bit of data (emoji to react with) which we want to attach to an existing relation on the database. - This would be `UPDATE` but it could also be `INSERT` on the database, depending on how the database is structured. - `DELETE` - Pretty straightforward, we want to delete something that exists on the server. - Delete a Discord Message. - Usually no need for anything in the request body. Why? - `DELETE FROM` on the database. There are others such as `HEAD` and `OPTIONS` which you normally won't encounter. ## HTTP Innovations Obviously, HTTP did not come from nowhere. There are some capabilities associated with different versions which you should be familiar with at this stage in the course. Some Definitions: - **RTT**: Round Trip Time - RTT = 2 * propagation time - total time = 2 * RTT + transmission time ### HTTP/1.0 Non-persistent means you need a new TCP connection for each request/response pair. So if you wanted to retrieve an image from a server, assuming that image fits in one packet, it would take 2RTT neglecting transmission time. However, note the structure of an HTML document (standard webpage structure format). ```html= <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>My Webpage</title> </head> <body> <img src="/path/to/image.jpg" alt="alt text"> <script src="/path/to/javascript.js"> </body> </html> ``` In this case, we have our original HTML document, and this HTML document requires two extra requests: an image and a Javascript source file, which it fetches from the server. So our sequence is as follows in HTTP/1.0: ```sequence Client->Server: GET /index.html Server-->Client: 200 OK Client->Server: GET /path/to/image.jpg Server-->Client: 200 OK Client->Server: GET /path/to/javascript.js Server-->Client: 200 OK ``` Therefore, it will take 1 RTT for the request/response for each object, plus another RTT for each request/response to set up the TCP connection. $$ T_{total} = 2 * t_{RTT} + 2n * t_{RTT} $$ For an HTML document referring to n objects within it. Note, you must **first retrieve the HTML document, since you need the entire thing before you can start retrieving the other objects**. ### HTTP/1.1 Persistent HTTP introduced in this version is essentially that you no longer need a new TCP connection for each request/response pair. So the sequence is the same, except the calculation is slightly different. $$ T_{total} = 2 * t_{RTT} + n * t_{RTT} $$ ### Parallel Connections This isn't necessarily HTTP, but HTTP itself being a stateless protocol allows for easy parallelism of connections between a client and server (limited data dependencies). However, doing so introduces additional overhead on the operating system (POSIX threads are expensive), and contention for the limited bandwidth available. If your OS can provide you with 8 parallel threads (8 physical cores) then you can perform 8 request/response in parallel. However, you **cannot** parallelize the fetching of `index.html` since it is a prerequisite for the others. For `m` threads and `n` objects, $$ T_{total, HTTP/1.0} = 2 * t_{RTT} + \bigg\lceil \dfrac{n}{m} \bigg\rceil * 2 * t_{RTT} $$ $$ T_{total, HTTP/1.1} = 2 * t_{RTT} + \bigg\lceil \dfrac{n}{m} \bigg\rceil * t_{RTT} $$ ### Aside: Pipelining This one is difficult to provide any exact calculation for. Essentially, you send many requests one at a time without waiting for a response for the previous, but it is not done completely in parallel. This is possible only over HTTP/1.1 (why?). Theoretically, it is approximate to simultaneous requests, so it can be treated like 1RTT for all objects (not each), but in reality there is some overhead here, and the server can be overloaded, so the propagation time may be longer, etc. ## Web APIs Web APIs leverage the structure of the HTTP protocol to offer services to clients. For example, programmers can interact with Discord through a script if they create and register an application and access Discord's [API](https://discord.com/developers/docs/reference). This is how Discord bots and webhooks work. Take a look under the [channel](https://discord.com/developers/docs/resources/channel) documentation. If you have the time, try [creating a bot application](https://discord.com/developers/docs/quick-start/getting-started) (everything I'm describing is free). Write a Python script to do the following using the authentication information described in their documentation (`requests` library is pretty good). - In a channel try making a `POST` request to publish a new message. - Use `PUT` to add a reaction. - `GET` the message. - Then `DELETE` the reaction. Isn't HTTP neat? ## Extra Notes from Discussions: - HTTP is **stateless** - Cookies enable stateful communication between client and server, but by default the protocol is not stateful - TCP is stateful, and HTTP usually uses TCP, but the HTTP protocol layer is **independent** of the layers below it - Keeping it stateless makes the protocol more flexible for application developers - TBD