2. The HTTP protocol

# 2. The HTTP protocol ###### tags: `Web Applications` The web is *de-centralized* and it is *collaborative* as anyone with a proper connection and access can contribute or fetch resources for it. In order to communicate across the web, we use the ***HTTP*** protocol ## What is HTTP? %% Obsidian comment According to [IBM](https://www.ibm.com/docs/en/cics-ts/5.6?topic=standards-hypertext-transfer-protocol-http11), HTTP is defined as: >\[…\] an application-level protocol for distributed, collaborative, hypermedia information systems. %% It is an application-layer protocol (Higher level than TCP) used for distributing HTML documents. With the years it has developed to be able to video and multiple types of media The HTTP protocol in a client-server environment works as follows: * The client sends a **request message** to the server, asking for access or an action in a particular **resource**. * The server responds by sending back a **response message** which typically contains the resources that were requested by the client. (This may not occur for a number of reasons, e.g., forbidden access or that the content is not available). ```mermaid graph LR; %% Comment %% Create the nodes client(Client); server(Server) %% Connect the two graphs client -- Request --> server server -- Response --> client ``` ### Different HTTP versions There are several HTTP versions, namely: * HTTP/1.1: most used since the 1990s. * HTTP/2: newer version with performance improvements. * HTTP/3: experimental version working on top of `QUIC` >This versions are not mutually exclusive, meaning that both are still used nowadays, and browsers still support HTTP/1.1 #### HTTP/2 HTTP/2 addresses some of the flaws that HTTP/1.1 had. It optimizes the transportation of messages and thus speeds up the web experience. It has **flow control** and **flow priorization** and it is able to **multiplex** requests and responses. Other new additions are **sever pushes** and **header compression**. Frames are encoded in a **binary format**. #### HTTP/3 HTTP/3 works on top of the QUIC (which implements TLS and retransmission) using **UDP**, instead of working with TLS and TCP. QUIC ensures that all the messages over HTTP/3 are encrypted throught **TLS**. >QUIC implements important TCP features in a faster manner, that's why UDP is used. comparison between `HTTP2` and `HTTP3` stacks | HTTP/2 | HTTP/3 | |-----|----| |*TLS*|QUIC (+ TLS)| |TCP|UDP| |IP|IP| * Note: TLS is optional for HTTP/2. ### HTTP methods A request message specifies a **method**. The principal methods used by applications in the Web are: * `GET`: used to fetch resources (webpages, images, etc) * `POST`: used to interact with the web resource. For this reason, `POST` requests must contain data in them: > There are many other such as `CONNECT`, `DELETE` and `OPTIONS`. #### GET requests * They are sent by browsers when URLs are entered or an hyperlink is clicked. * They are usually cached to speed up the processes and free up resources. * They *supposed* to be safe (i.e., not to have side effects, as they do not contain extra data). The two last operations means that developers need to be careful when designing websites as they might behave in unitended ways. Therefore, a `POST` method will be used. #### POST requests * They are produced by browsers when forms are sent. * They are *not* cached. * They *can* be unsafe. <span style="color:gray">Binance API `POST` request data example:</span> ```jsonld { "fromCoin": "EUR", "toCoin": "USD", "requestCoin": "ETH", "requestAmount": 10, "timestamp": int(round(time.time() * 1000)) } ``` ### Structure of an HTTP request It is comprised by: * Resource URL (no scheme or authority). * Method * Headers * Body (only for certain methods). Headers provide additional data on the handling of the request. The body contains the data that is to be processed by the server. #### Body * It cannot be a `GET` resource. * It contains the data (for `POST` requests). * It typically appears alongside request headers. ### Structure of an HTTP response It is comprised by: * **Status**: numerical code that shows the result of the request. * **Headers**: HTTP data that browsers and servers exchange. * **Body**: Content fo the response. **Headers** provide additional data on the handling of the request. The body is the encoding of the answer to the request, e.g., an web page. ### HTTP status codes In order to inform both servers and users about events like errors in the page, HTTP uses status codes to address these events. (The `X`'s stand for numbers). - `1XX`: Codes starting with a **1** are **informational**. - `2XX`: starting **2** indicate a **successful** operation, e.g. the `200 OK` status code. - `3XX`: starting **3** warn about a **redirection** to another source, e.g. `303 SEE OTHER`. - `4XX`: starting **4** alert about **errors**, e.g. the famous `404 NOT FOUND` error, or the `403 FORBIDDEN` code. - `5XX`: starting **5** alert about **errors** in the server, e.g. `500 INTERNAL SEVER ERROR`. ## URIs As defined in [[1. Introduction to the Web]], an URI is the address used in the web for naming resources #### Structure of a URI $$\mathtt{\underbrace{https}_{scheme}://\underbrace{www.youtube.com}_{authority}/\underbrace{watch\ }_{path}\underbrace{?v=}_{query}\underbrace{\ 4ozip0cgoho}_{fragment\ id}}$$ * **Scheme**: schemes used for HTTP, `http` and `https`. * **Authority**: It is the IP address or domain name [see DNS](DNS) of the server we are connecting to. * **Path**: is the element that identifies a resource within a given scheme and authority. It is often established hierarchically and delimited by slashes `/.../.../.../`. * **Query**: the query is *not* hierarchical and, in combination with the path, completes the information necessary to access the resource. The query usually starts with an interrogation mark `?`. * **Fragment identifier**: it identifies a specific part of a resource, e.g., the sidebar or an image. URIs can only be made up of [ASCII](ASCII) characters (alphanumeric symbols consisting of letters, digits and some other characters). Some of them are *reserved* characters, which means that you cannot use them unless they are being employed for the role that they were reserved for. For example, the `/` can only be used to separate the path and its inner parts and cannot be used as a domain name. If we are going to use a non-ASCII character such as `á` it will have to be encoded as an octet. This octet sequence begins with the symbol `%` followed by the octet that is represented by two hexadecimal digits. ## Cookies Cookies help **keep track of the state** of the current webpage of web application. >They are used because HTTP is a *stateless* protocol which means that every request is independent from any other one. This is troublesome when we want the server to preserve the state, e.g. when authenticating in a given web application and willing to perform session tracking. Cookies are **path-specific**, meaning that a cookie will be used for resources inside a given path for which it had been generated for. > Cookies have an **expiration date** and they cannot be accessed through JavaScript. ### Common usage of cookies * Session tracking. * Saving user settings and preferences (client side). * User tracking and analysis (user consent is required under EU law). ## HTTPS (`HTTP` with `TLS`) It stands for `HTTP` with `TLS`, it is a **criptografically secure** (*uses encryption*) version of HTTP. **Transport Layer Security** (TLS) provides *confidentiality*, *integrity* and *authentication* to an HTTP session. TLS can be used with many other protocols. Using TLS, data is encrypted using a symmetric key, and processes are authenticated through a PKI (public-key infrastructure). TLS identifies and avoids unintended or malicious modification of data. TLS has its origins in `SSL` (Secure Socket Layer). ### Establishing HTTPS connections In order to exchange data through HTTPS, a TLS session must be first established. This '*connection establishment*' is called the **Handshake**. The **Handshake** is made up of three steps: 1. **Negotiation**: It is the initial connection between the client and server to choose a TLS version and a specific encryption algorithm. 2. **Key exchange**: As #TODO:link symmetric encryption is used, the server and client get a symmetric key through asymmetric encryption after the negotiation. 3. **Authentication**: The client can verify the identity of the server by it's TLS certificates. As certificates are signed by [[Digital signatures and PKI#Certificate Authorities (CA) |Certificate Authorities (CA)]], the server must send the certificate in it's certificate chain of a CA trusted by the user. >[!info] >More on this topic on [[Digital signatures and PKI]]. #### Session resumption TLS can also resume previous sessions when a client revisits the same resource. This is done by reusing previously exchanged keys in a **Pre-session key handshake** (**PSK**)