--- title: 'HTML' disqus: hackmd tags: cheatsheet --- :::spoiler TOC [TOC] ::: # 工具 - [jsbin](https://jsbin.com/?html,output): 快速測試 HTML - [Living Standar - 8 Web application APIs](https://wicg.github.io/controls-list/html-output/multipage/webappapis.html#webappapis) > multipage 和 one-page 內容有出入 - [shazzer](https://shazzer.co.uk/): browser behavior fuzzer - [dom-explorer](https://yeswehack.github.io/Dom-Explorer/) # 解析 ## 字組 code points 定義 - https://infra.spec.whatwg.org/#code-points > An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR. > ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE ## 可解析成 script 的 content-type - [BlackFan/content-type-research](https://github.com/BlackFan/content-type-research/blob/master/XSS.md): 2020 可觸發 XSS 的 Content-Type - [portswagger XSS Content types](https://portswigger.net/web-security/cross-site-scripting/cheat-sheet#content-types) ## HTML entity - 查表:https://html.spec.whatwg.org/multipage/named-characters.html - 轉換工具:https://mothereff.in/html-entities - HTML attribute 內容都可被 HTML entity 編碼 - `&amp;` `&#38;` `&#x26;` - `<iframe src='&lt;script&gt;alert(1)&lt;/script&gt;'>` - `script` `style` 內無法使用 HTML entity,但是 `svg,math`底下的 `script` 可以使用,因為會轉換成 xml parsing - 解析定義在 [character reference state](https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state),只有 [RCDATA state](https://html.spec.whatwg.org/multipage/parsing.html#rcdata-state), [Data state](https://html.spec.whatwg.org/multipage/parsing.html#data-state) 和 Attribute value 會解析 ## HTML attribute - value 可被 HTML entity 編碼 - [Attribute Name State](https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state): attribute name 可包含 `'"<` - [Before attribute value state](https://html.spec.whatwg.org/multipage/parsing.html#before-attribute-value-state): 會忽略 `=` 和 value 間的 `\x09\x0a\x0c\x20` - ## 協議 > protocol, scheme - https://github.com/chromium/chromium/blob/63b085a78684914ec6ceefca8c46fe837ad5e007/docs/special_case_urls.md - navigate 大致解析順序 - (...HTML entity decode)(?) - 去垃圾 tab newline, parse scheme - url decode without leading `[scheme]:` - element src - 底層是呼叫 `fetch` API 解析 - 依據 agent 實作有些會 prefetch(before append element),像是 chrome svg, img - https://fetch.spec.whatwg.org/#scheme-fetch - 防禦 - 呼叫 `new URL` 解析後才做過濾 - common - scheme case-insensitive - `//`, `\\` 解析成當前域協議 - chrome 正規化實作 https://github.com/chromium/chromium/blob/4ea9fb6208b67451a55f9a2fa2eaba8fd7b61bae/url/url_util.cc#L250C6-L250C20 - 可以塞垃圾 (<=0x20) 在前後 > https://github.com/chromium/chromium/blob/0359c8efed03242bc8565845cd3d4b91d3b18f77/url/url_parse_internal.h#L47 - 可以塞垃圾 \r \n \t 到任意位置 > [ concept basic url parser > 2.If input contains any ASCII tab or newline, invalid-URL-unit validation error. > 3.Remove all ASCII tab or newline from input.](https://url.spec.whatwg.org/#concept-basic-url-parser) - `location.href="\tjAva\tScript:%0a\ta\tl\te\tr\tt(1)"` > https://github.com/chromium/chromium/blob/0359c8efed03242bc8565845cd3d4b91d3b18f77/url/url_parse_internal.h#L47 - TODO: review [basic URL parser algorithm](https://url.spec.whatwg.org/#concept-basic-url-parser) - [`javascript:`, javascript url](https://html.spec.whatwg.org/#the-javascript:-url-special-case) - [spec: evaluate a javascript URL](https://html.spec.whatwg.org/#evaluate-a-javascript:-url) - [偽協議(`javascript:`)會做 URL decode](https://www.leavesongs.com/PENETRATION/xss-from-my-blog.html) ``` <a href="javascript:console.log('&percnt;26')">aaaa </a> <!-- 先解析 HTML entity,然後偽協議的URL decode,所以結果是印出 '&' --> ``` - 如果執行結果回傳字串會複寫當前頁面,可以注入 HTML > [6.Let newDocument be the result of evaluating a javascript: URL given targetNavigable, url, and initiatorOrigin. > 7.If newDocument is null, then return.](https://html.spec.whatwg.org/#the-javascript:-url-special-case) > > [9.If evaluationStatus is a normal completion, and evaluationStatus.[[Value]] is a String, then set result to evaluationStatus.[[Value]]. > 10.Otherwise, return null.](https://html.spec.whatwg.org/#evaluate-a-javascript:-url) - `location.href='javascript:"<svg/onload=alert(window.origin)>"'` - 回傳非 string 會強制回傳 null - [`data:`, data url](https://fetch.spec.whatwg.org/#data-urls) - 第一個 `,` 會是 mimetype([mediatype]+[";base64"]), 剩下為 encodedBody,只有 encodedBody 會經過 percent decoding 變成 body ``` RFC 2397 dataurl := "data:" [ mediatype ] [ ";base64" ] "," data mediatype := [ type "/" subtype ] *( ";" parameter ) data := *urlchar parameter := attribute "=" value ``` - TODO: `Let stringBody be the isomorphic decode of body.` - [`cid:`, content-ID](https://www.ietf.org/rfc/rfc2392.txt) - 防止 URL encode ``` <a id=aaa href='<iframe></iframe>'> aaa.href === 'http://web/%3Ciframe%3E%3C/iframe%3E' <a id=aaa href='cid:<iframe></iframe>'> aaa.href === 'cid:<iframe></iframe>' ``` - DOMPurify 預設允許 `(?:f|ht)tps?|mailto|tel|callto|sms|cid|xmpp)`,其中 `ftps|tel|callto|sms|cid|xmpp` 取值都不會做 url encode ## HTML Quirk - TODO ## execution order - 在 document 尚未解析完前 `document.write` 會寫在下一行,解析完後是 overwrite ``` <p>a</p> <script> const a= ()=>(document.write('aaaa')) a() </script> <p>b</p> ``` - [`<svg><svg/onload>` 在插入前就觸發](https://blog.huli.tw/2022/02/08/what-i-learned-from-dicectf-2022/#%E9%A0%90%E6%9C%9F%E5%A4%96%E8%A7%A3%E6%B3%95) > 猜測:因為 svg 切換成 XML parsing 後,第二層的 svg 經過 tokenize 後直接觸發 tree construction ``` <div id=x></div> <div id=y>hello</div> <script> x.innerHTML = '<svg><svg onload=alert(window.y.innerText)>' y.innerText = 'updated' </script> <!-- 在 y.innerText 前就觸發 --> ``` # HTML Element 特性 - [HTML Parser spec](https://html.spec.whatwg.org/multipage/parsing.html) - 分成 tokenizer 和 tree construction 兩塊,tokenizer 把餵入的文本 tokenized 後 emit token,每次 emit 後就換 tree construction 把 token 轉成 element,依據演算法插入 DOM 位置,然後影響接下去的 tokenizer 過程,兩個流程交互作用 - ## meta - 跳轉 - `<meta http-equiv="refresh" content="5; url=https://www.fooish.com">` - 可以跳到 blob url - origin 會相同,CSP 繼承 - https://blog.huli.tw/2023/09/23/hitcon-seccon-ctf-2023-writeup/#canvas-4-solves - https://bugs.chromium.org/p/chromium/issues/detail?id=933171 ## comment - open: `<!--` close:`-->` 或是 `--!>` - [incorrectly-closed-comment](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment) - [bogus comment state](https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state): `<?` 開頭會把遇到第一個 `>` 間的內容都當成註解 ## select - https://blog.huli.tw/2021/11/14/intigriti-xss-1021/ - select 內非法 tag 會被清掉 ``` <select> <div> jizzzz </div> </select> ``` become ``` <select> </select> ``` ## table - https://blog.huli.tw/2021/11/14/intigriti-xss-1021/ - https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-intable - table 內不屬於 table 的非法元素會觸發 [foster parenting](https://html.spec.whatwg.org/multipage/parsing.html#foster-parent) ,擠到 table 之前 ``` <table> <div> jizzzz </div> </table> ``` become- ``` <div> jizzzz </div> <table> </table> ``` ## a - `download` - 指定 download filename ``` <a download=qwe> // filename=qwe.txt <a download=pwn.exe> // filename=pwn.exe ``` - data uri + download - 隨意創造 download 內容`<a href='data:,jizzz' download>` - Alles CTF 2021 ALLES!Chat - puppeteer 可以直接觸發 download - `href` - 當 `a` 轉換成字串時,會使用 `href` 內容 ``` <a href='ji:alert(1)' id=aaa> aaa + '' // 'ji:alert(1)' <a href=' alert(1)' id=aaa> aaa + '' // '$SCHEMA://$HOST/alert(1)' ``` ## link - 只有 firefox 有實作 [link header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link)? ## iframe :::spoiler ref - [x] https://blog.huli.tw/2022/04/07/iframe-and-window-open/ ::: - [spec](https://html.spec.whatwg.org/multipage/iframe-embed-object.html#the-iframe-element) - cross origin 可接觸 property > A JavaScript property name P is a cross-origin accessible window property name if it is "window", "self", "location", "close", "closed", "focus", "blur", "frames", "length", "top", "opener", "parent", "postMessage", or an array index property name. https://html.spec.whatwg.org/multipage/browsers.html#crossoriginproperties-(-o-) - 可以更改不同源底下任意 iframe location - 可以偷取 postMessage 資訊 > Always specify an exact target origin, not *, when you use postMessage to send data to other windows. A malicious site can change the location of the window without your knowledge, and therefore it can intercept the data sent using postMessage.[ref](https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage#security_concerns) - 可以偷取 cross-origin iframe window name - 只要設成 'about:blank' 就會同源 - iframe xss leak data in `window.name` -> parent set iframe location to `about:blank` -> read `contentWindow.name` in parent - [Bypassing CSP with dangling iframes](https://portswigger.net/research/bypassing-csp-with-dangling-iframes) - 可以 redirect top location,非同源 chrome 會預設阻擋 - `src` - 支援 `data:`, `javascript:` - `data:` origin 會設成 null - `javascript:` 要求 same origin - `srcdoc` - = src + data URI - origin === parent.origin - `csp` - 設定 csp 屬性 - 變嚴格可以使一些 script 失效 - `sandbox` - 將 iframe origin 預設成 null 並且關閉許多功能 - sandbox 開啟的東西會繼承 sandbox 限制 - 會把新開啟的頁面 origin 跟著變 null,有機會繞過 same origin 檢查 e.g. postMessage - `credentialless` - https://blog.slonser.info/posts/make-self-xss-great-again/ - self-XSS ## form - 提交 json-like body ``` <form method='POST' enctype='plain/text'> <input name='{"k":"v' value='"}"'> </form> ``` - newline normalization - https://blog.whatwg.org/newline-normalizations-in-form-submission - form `\n` 實際上傳輸變成 `\r\n` - 前後端長度會不一 ## script :::spoiler refs - [x] [script type 知多少?](https://blog.huli.tw/2022/04/24/script-type/) - [ ] [\[LINE CTF 2021\] Haribote-Secure-Note](https://gist.github.com/mdsnins/d8028c47212342ecadd9af5ec10f53f9) ::: - 可解析的 `Content-Type` - by [spec](https://mimesniff.spec.whatwg.org/#javascript-mime-type) - by [implement](https://chromium.googlesource.com/chromium/src.git/+/refs/tags/103.0.5012.1/third_party/blink/renderer/core/script/script_loader.cc#184) and [implement](https://chromium.googlesource.com/chromium/src.git/+/refs/tags/103.0.5012.1/third_party/blink/renderer/core/script/script_loader.cc#247) Chromium build 103.0.5012.1 - [legacy supported](https://chromium.googlesource.com/chromium/src.git/+/refs/tags/103.0.5012.1/third_party/blink/common/mime_util/mime_util.cc) build 103.0.5012.1 :::spoiler legacy supprted list ``` "application/ecmascript", "application/javascript", "application/x-ecmascript", "application/x-javascript", "text/ecmascript", "text/javascript", "text/javascript1.0", "text/javascript1.1", "text/javascript1.2", "text/javascript1.3", "text/javascript1.4", "text/javascript1.5", "text/jscript", "text/livescript", "text/x-ecmascript", "text/x-javascript", ``` ::: - `application/webbundle`: TODO - script `type` attribute 可接受ㄉ值 - Content-Type 那些 - `webbundle` - `module` - `importmap` - `speculationrules` - [Script data double escape start state](https://html.spec.whatwg.org/multipage/parsing.html#scriptTag) - 一般來說,`<script>` 碰到 `</script>` 就會被閉合,執行中間內容 - 在 tokenizer 階段 script data (內容)遇到 `<!--<script>` ,這個 state 會把後面遇到的 `</script>` 也當成內容的一部分,然後才尋找 `</script>` 做閉合,因此可以跨過 `script` ,另外[內容的 `<!--` 和 `<script` 要 balanced 內容才會被執行](https://html.spec.whatwg.org/multipage/parsing.html#scriptEndTag) ``` <script> alert('exec1') <!--<script> /* </script> --> <script> */ alert('exec2') </script> ``` - 以下是實際上執行內容,`<!--` 和 `-->` 在 browser 被視為單行註釋,下面的 `-->` 是為了平衡而沒有註釋效果,被包在 `/**/` 本身已經是註釋一部份 ``` alert('exec1') <!--<script> /* </script> --> <script> */ alert('exec2') ``` - script under svg - 普通的 `script` innerText 是無法使用 HTML entity,若是在 `svg` 裡則可以使用,可以繞過 WAF - ``` <body> <svg> <script> &#x61;lert("works!"); </script> </svg> <script> &#x61;lert("not workQQ"); </script> </body> ``` - [chromium issue](https://bugs.chromium.org/p/chromium/issues/detail?id=114641) ## svg - svg 會把 parsing 模式轉換成 xml - `script` 會解析 HTML entity - `<foreignObject>` 可寫 html ``` // from huli <foreignObject> <iframe srcdoc="&lt;script&gt;alert(document.domain)&lt;/script&gt;"></iframe> </foreignObject> ``` ## label - 可以綁定 [`<button>, <input>, <meter>, <output>, <progress>, <select>, <textarea>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Content_categories#labelable) - 搭配 querySelector 特性 hijack on-event ## input - [HTML Type Override](https://docs.google.com/presentation/d/1jW0o1YO3FNXlXVkAziM_wSGQqRdLP2kmfoBb6mF1bGY/edit#slide=id.g2f056d28156_1_204) - `<input type=image src=x onerror=alert(7122)>` # HTML attribute ## popover - https://portswigger.net/research/exploiting-xss-in-hidden-inputs-and-meta-tags # markup dangling attack > ref: maple3124 - chrome 已經阻擋 - https://chromestatus.com/feature/5735596811091968 - firefox 未阻擋 # mxss - [Write-up of DOMPurify 2.0.0 bypass using mutation XSS](https://research.securitum.com/dompurify-bypass-using-mxss/) - svg 讓 content model 改變 `<svg></p><style><a id="</style><img src=1 onerror=alert(1)>">` # TODO - https://jorianwoltjer.com/blog/p/ctf/intigriti-xss-challenge/0725