HTTP and Web Programming

Topics Covered: HTTP, Client-Server Model, Web APIs

What is HTTP?

HTTP (Hypertext Transfer Protocol) is an application layer protocol designed for transferring information over the web between networked devices, and is primarily built on TCP (some recent versions use UDP, which will be discussed further later in the course).

Like any protocol, it's a standard agreed upon by networked devices on the Internet. It has a specific message format, which any receiver and sender adheres to.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

From Professor Zhang's slide deck, this diagram is especially useful in visualizing the format. You can think of it as the headers providing the information about the client message (request) and how it should be handled, and the body contains some data the server can use.

Generally speaking, the web works as an interaction between a client device and a server device. Both are just computers connected to a network, there's nothing special about a server. The client provides the server with a request, the server processes the request and (usually) does some action, then the server sends the client a response. The request and response are both done over HTTP. TCP ensures there is a streamed connection, while UDP is more along the lines of send and forget, so TCP makes more sense for now.

HTTP Methods

HTTP Methods specify the type of action the client wants from the server. Here are the main ones:

GET
- Asks the server to retrieve some resource and send it over to the client. This is usually a SELECT operation against a database.
- If the resource is not available, the server responds with 404 status code (more on this later).
POST
- Asks the server to create a new entry. This could be like someone creating a new user profile on Instagram.
- Along with the POST method header, you will normally send a payload inside the request body containing relevant information (username, hashed password, etc.).
- This would correspond to a INSERT operation against a database.
PATCH
- PATCH and PUT get confused often. PATCH contains information about how to modify an existing resource.
- For example, if I wanted to change my profile picture on some social media platform, that should be a PATCH request containing the new profile picture or some encoding of how to change it (in the request body, of course).
- Editing a message on Discord is another example.
- This would be an UPDATE on a database.
PUT
- Unlike PATCH, PUT contains information about the new object we want to add or modify to an existing item on the server.
- For example, adding a reaction to a message on Discord would be a PUT.
- We have a new bit of data (emoji to react with) which we want to attach to an existing relation on the database.
- This would be UPDATE but it could also be INSERT on the database, depending on how the database is structured.
DELETE
- Pretty straightforward, we want to delete something that exists on the server.
- Delete a Discord Message.
- Usually no need for anything in the request body. Why?
- DELETE FROM on the database.

There are others such as HEAD and OPTIONS which you normally won't encounter.

HTTP Innovations

Obviously, HTTP did not come from nowhere. There are some capabilities associated with different versions which you should be familiar with at this stage in the course.

Some Definitions:

RTT: Round Trip Time
- RTT = 2 * propagation time
- total time = 2 * RTT + transmission time

HTTP/1.0

Non-persistent means you need a new TCP connection for each request/response pair. So if you wanted to retrieve an image from a server, assuming that image fits in one packet, it would take 2RTT neglecting transmission time. However, note the structure of an HTML document (standard webpage structure format).











<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>My Webpage</title>
    </head>
    <body>
        <img src="/path/to/image.jpg" alt="alt text">
        <script src="/path/to/javascript.js">
    </body>
</html>

In this case, we have our original HTML document, and this HTML document requires two extra requests: an image and a Javascript source file, which it fetches from the server. So our sequence is as follows in HTTP/1.0:

Therefore, it will take 1 RTT for the request/response for each object, plus another RTT for each request/response to set up the TCP connection.

T_{t o t a l} = 2 * t_{R T T} + 2 n * t_{R T T}

For an HTML document referring to n objects within it. Note, you must first retrieve the HTML document, since you need the entire thing before you can start retrieving the other objects.

HTTP/1.1

Persistent HTTP introduced in this version is essentially that you no longer need a new TCP connection for each request/response pair. So the sequence is the same, except the calculation is slightly different.

T_{t o t a l} = 2 * t_{R T T} + n * t_{R T T}

Parallel Connections

This isn't necessarily HTTP, but HTTP itself being a stateless protocol allows for easy parallelism of connections between a client and server (limited data dependencies). However, doing so introduces additional overhead on the operating system (POSIX threads are expensive), and contention for the limited bandwidth available. If your OS can provide you with 8 parallel threads (8 physical cores) then you can perform 8 request/response in parallel. However, you cannot parallelize the fetching of index.html since it is a prerequisite for the others.

For m threads and n objects,

T_{t o t a l, H T T P / 1.0} = 2 * t_{R T T} + ⌈ \frac{n}{m} ⌉ * 2 * t_{R T T}

T_{t o t a l, H T T P / 1.1} = 2 * t_{R T T} + ⌈ \frac{n}{m} ⌉ * t_{R T T}

Aside: Pipelining

This one is difficult to provide any exact calculation for. Essentially, you send many requests one at a time without waiting for a response for the previous, but it is not done completely in parallel. This is possible only over HTTP/1.1 (why?). Theoretically, it is approximate to simultaneous requests, so it can be treated like 1RTT for all objects (not each), but in reality there is some overhead here, and the server can be overloaded, so the propagation time may be longer, etc.

Web APIs

Web APIs leverage the structure of the HTTP protocol to offer services to clients. For example, programmers can interact with Discord through a script if they create and register an application and access Discord's API. This is how Discord bots and webhooks work.

Take a look under the channel documentation. If you have the time, try creating a bot application (everything I'm describing is free). Write a Python script to do the following using the authentication information described in their documentation (requests library is pretty good).

In a channel try making a POST request to publish a new message.
Use PUT to add a reaction.
GET the message.
Then DELETE the reaction.

Isn't HTTP neat?

Extra Notes from Discussions:

HTTP is stateless
- Cookies enable stateful communication between client and server, but by default the protocol is not stateful
- TCP is stateful, and HTTP usually uses TCP, but the HTTP protocol layer is independent of the layers below it
- Keeping it stateless makes the protocol more flexible for application developers
TBD

HTTP and Web Programming

What is HTTP?

HTTP Methods

HTTP Innovations

HTTP/1.0

HTTP/1.1

Parallel Connections

Aside: Pipelining

Web APIs

Extra Notes from Discussions:

Read more

BGP and OSPF: Enabling Autonomous Systems Communication

Wireless, Mobility, and Putting it all Together

S24 CS 118

Link Layer and Switching