Web Exploitation Intro

# Web Exploitation Intro ## Club Resources * [Practice Problems](https://ctf.tjcsec.club) * [Codespaces Desktop](https://github.com/dianalin2/desktop) * [Shell Commands List](https://hackmd.io/@tjcsc/cmd) ## Definition List Client : The "user" side. Typically a web browser such as Google Chrome of Firefox. Server : The remote machine that sends information about the site that clients visit. ## How does communication work? In order for a web page to be displayed on your computer, your computer (the client) must communicate with the server. For web pages, this communication is done through **Hypertext Transfer Protocol (HTTP)**. Note that HTTP is NOT a piece of software — it is just a specified way for computers to communciate with each other. Instead, other pieces of software implement parts of the protocol in order for us to be able to communicate with each other. Communication happens through **requests** and **responses**. First, the client sends a request to the server to ask for a specific page. The server then processes the request and sends a response, which includes the content of the web page. ### HTTP Requests Let's deconstruct what a request consists of by looking at a sample request. A web browser may send the following lines to a server to ask the server for http://www.example.com/my-page: ```http GET /my-page HTTP/1.1 Host: www.example.com Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br ``` The first line is what we call the **request line**. The `GET` on the request line is what we call a **request method**. The request method indicates what action the client wants the server to take, although following the specified action is at the discretion of the server. Some examples of request methods are: - GET: Retrieve data but do not modify it. This is usually the method that is used when you type something into your web browser. - POST: Submit data to the server. This often results in a change or modification on the data stored on the server. - PUT: Replace data on the server. - DELETE: Delete data on the server. The request method is followed by the path. In this case, the path is `/my-page`. It asks the server for the content that is served at the specified path. `HTTP/1.1` means the version of the HTTP protocol that the client wants the data in. HTTP/2 and HTTP/3 are also available and widely used by browsers. They are *much* more difficult to read and decipher, so our sample uses HTTP/1.1, but they have pretty much all the same features of HTTP/1.1 with some improvements in speed. After the request line, we provide **request headers**. They contain further information about the client or more information about what we want the page to contain. For example, we provide the `Host` header to let the server know that we accessed it through the domain[^1] `www.example.com`. We also provide the `Accept-Language` header, which lets the server know that we want our page in English. **Request bodies** are often specified after the request headers, but bodies aren't used with GET requests[^2]. They are, however, used with most other request methods, such as POST, PUT, and DELETE. This body can contain whatever information that the user wants to put in it as long as the server can understand it. Here is an example of a POST request with a request body: ```http POST / HTTP/1.1 Host: www.example.com Content-Type: application/x-www-form-urlencoded Content-Length: 26 username=hi&password=wowow ``` Note that the request body is separated from the request headers by an empty line. An important header to know is the `Content-Type`. HTML forms, by default, submit `application/x-www-form-urlencoded` data, which is a fancy way to say they will submit form fields in the `x=value1&y=value2&z=value3` format. Some servers will only take other `Content-Type`s, such as `application/json`, so be aware of that. ### HTTP Responses After a request is sent, the server processes the request and makes a response. A sample response is as follows: ```http HTTP/1.1 200 OK Content-Type: text/html; charset=UTF-8 Content-Length: 111 Connection: close <html> <head> <title>An Example Page</title> </head> <body> <p>Hello World!</p> </body> </html> ``` The first line of a response is the **status line**, which specifies the HTTP version used and the **status** of the response. That is to say: "Did the server understand the request properly?" The number (`200`) is the **status code**. The text next to it (`OK`) is the **reason phrase**, which gives a short description of the status code. Some common statuses are shown below: * 200 OK: Everything went perfectly and a proper response was sent back. * 302 Found: The server is telling the client to redirect to the new URL. * 404 Not Found: The server can't find the data that the user is looking for. * 500 Internal Server Error: The server ran into an error trying to process the request. After that, the **response headers** are stated. They are identical to the format of the request headers. Lastly, the **response body** is stated. This is what is displayed on the page when you access a website. ## Client-Side Information gets sent to the client, but what exactly does the client do with said information in order to display a web page? We saw earlier that servers usually respond with code, so how does that translate into the pretty image that we usually see? The browser interprets the received "code" and figures out what to do with it. There are three major languages that are interpreted by the browser: - Hypertext Markup Language (HTML) tells the browser what to display. It describes the structure of the web page. - Cascading Style Sheets (CSS) tell the browser how to make the page look nice. Most of the time, CSS doesn't interfere with how you *can* interact with a page but, instead, makes it more pleasing to look at. - JavaScript makes the page interactive. Without JavaScript, sites would be pretty boring — the most interactivity you could have is a nice-looking form. However, JavaScript lets you create advanced web pages, games, and much more. When you think of "code," JavaScript is probably closest to what you think of. HTML and CSS don't really do anything by themselves expect describe how things look. JavaScript lets you do a whole slew of things. ### Cookies How does information persist on the client-side? How do you stay logged in to Facebook after you close the browser? HTML and CSS don't do anything, and JavaScript only knows things starting from when the page was loaded. **Cookies** are pretty well-known nowadays. They are the most common form in which information is stored on the client side. The browser stores a small piece of data (the "cookie") that is sent to the server with every request. This lets servers easily track information, including but not limited to information about who we are. To edit cookies in Google Chrome, open the Inspect Element window (Ctrl+Shift+C) and select the Application tab. On Firefox, you may need to open the Inspect Element window and select the Storage tab. ![Google Chrome Screenshot](https://hackmd.io/_uploads/SymrPu6lp.png) ## Server-Side ### Static Sites Static sites are the most simple type of site. It basically treats a folder and its files/subfolders as content for the web server. When a user connects to a server, it looks at the folder to find a corresponding file. A sample filesystem for a site that uses NGINX (a web server) to serve files is shown below: ![NGINX Folder Diagram](https://i.imgur.com/5r9gSkY.png) When a user connects to https://example.com/images/img1.png, the web server looks inside the images folder to find img1.png. The file index.html is slightly special. When a client connects to https://example.com with no path afterwards, index.html is served. ### Dynamic Sites Over time, websites have become more powerful. Earlier, I wrote that, after the server finishes processing the request, it sends a response. HTTP doesn't specify *how* the server processes the request, so it can do things other than just send the same file. A dynamic site responds with different content when a different client connects. Twitter does it, Google does it, and Microsoft does it. A very commonly-used framework to develop dynamic sites is Flask, which lets programmers easily create dynamic sites in Python. We have a challenge on [ctf.tjcsec.club](ctf.tjcsec.club) that requires you to read a Flask site. ## Connecting to Web Servers The most obvious way to connect to a web server is to use your web browser. However, there are some other ways to do so as well. ### Terminal To connect to a server in the terminal and print the output, you can use the curl utility: ```sh curl https://www.example.com -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "data=data" ``` The request method is specified with the `-X` option. Headers can be specified with (perhaps multiple) `-H` options. The request body can be specified with the `-d` option. To connect to a server and easily download the response body, you can also use `wget`: ```sh wget https://www.example.com ``` ### Python You can also connect to web servers in your favorite programming language. My (second) favorite language to code in is Python, so we'll go over that. It also just so happens that Python is very useful for solving computer security challenges, so that makes it even more important to know. No worries if Python is new to you! If you're having difficulties with reading or writing code in Python, feel free to look for resources online or contact an officer. After you [install Python](https://www.python.org/downloads/) (this is already installed in Codespaces), you can open a Python shell in your terminal by typing `python3`. You can also create a new Python script by creating a file called `<filename>.py` (where you replace `<filename>` with an actual filename). If you haven't worked with Python before, watch [this 59.001 second video](https://youtu.be/fabelAs_m08) to get a very brief overview of the basics. Don't worry too much if you don't understand what a slice is. Once again, **this is a very brief overview**. You won't understand everything if you only watch this video. Coding and fixing problems you encounter is likely more helpful. Additionally, officers are more than happy to help if you encounter issues. The video did not cover how to import modules. In order to connect to a web server, we will need a module called `requests`. To use it, type `import requests`. A sample script that uses the `requests` module is provided below with comments (preceded by `#`) to help with understanding: ```python= import requests # allow us to use the requests module response = requests.get('https://example.com') # connect to https://example.com and save the response in the variable called response print(response.status_code) # print the status code to the console # if the response is OK and "<!doctype html>" is in the response body, do something # note that, in Python, indentation indicates that the following code belongs to the if statement if status_code == 200 and '<!doctype html>' in response.text: print(response.text) # print the response body to the console print("we're done!") ``` Another sample script that uses loops is provided below: ```python= import requests # allow us to use the requests module # this loop runs from 0 to 10, exclusive # this means that it will start at https://example.com/0 and end at https://example.com/9 for i in range(10): # connect to https://example.com/<number> ten times # save the response in the variable called response response = requests.get('https://example.com/' + str(i)) print(response.status_code) # print the status code to the console # if the response is OK and "<!doctype html>" is in the response body, do something if status_code == 200 and '<!doctype html>' in response.text: print(response.text) # print the response body to the console print("we're done!") ``` ## Conclusions We went over a lot of content this block that might seem very low-level. That is true! You don't really need to know in-depth how servers and clients communicate. As always, if you have any questions, feel free to contact us by: - Asking for help during a club block - Creating a ticket on our [Discord server](https://tjcsec.club/discord) - DMing an officer Happy hacking! [^1]: A domain is a string of text that is mapped to an IP address, which is akin a "home address" for a computer. A domain lets the user type in a human-friendly name in order to access a web page. [^2]: GET requests often specify parameters in the URL after a `?` character. This makes URLs look something like `https://www.example.com/?text1=hi&text2=bye`. This should be non-confidential information because it is stored in the browser history and easily seen in the search bar. [^3]: A boolean is a true/false value. In Python, these true/false values are explicitly typed as `True` and `False`.