---
title: 'HTML'
disqus: hackmd
tags: cheatsheet
---
:::spoiler TOC
[TOC]
:::
# 工具
- [jsbin](https://jsbin.com/?html,output): 快速測試 HTML
- [Living Standar - 8 Web application APIs](https://wicg.github.io/controls-list/html-output/multipage/webappapis.html#webappapis)
> multipage 和 one-page 內容有出入
- [shazzer](https://shazzer.co.uk/): browser behavior fuzzer
- [dom-explorer](https://yeswehack.github.io/Dom-Explorer/)
# 解析
## 字組 code points 定義
- https://infra.spec.whatwg.org/#code-points
> An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR.
> ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE
## 可解析成 script 的 content-type
- [BlackFan/content-type-research](https://github.com/BlackFan/content-type-research/blob/master/XSS.md): 2020 可觸發 XSS 的 Content-Type
- [portswagger XSS Content types](https://portswigger.net/web-security/cross-site-scripting/cheat-sheet#content-types)
## HTML entity
- 查表:https://html.spec.whatwg.org/multipage/named-characters.html
- 轉換工具:https://mothereff.in/html-entities
- HTML attribute 內容都可被 HTML entity 編碼
- `&` `&` `&`
- `<iframe src='<script>alert(1)</script>'>`
- `script` `style` 內無法使用 HTML entity,但是 `svg,math`底下的 `script` 可以使用,因為會轉換成 xml parsing
- 解析定義在 [character reference state](https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state),只有 [RCDATA state](https://html.spec.whatwg.org/multipage/parsing.html#rcdata-state), [Data state](https://html.spec.whatwg.org/multipage/parsing.html#data-state) 和 Attribute value 會解析
## HTML attribute
- value 可被 HTML entity 編碼
- [Attribute Name State](https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state): attribute name 可包含 `'"<`
- [Before attribute value state](https://html.spec.whatwg.org/multipage/parsing.html#before-attribute-value-state): 會忽略 `=` 和 value 間的 `\x09\x0a\x0c\x20`
-
## 協議
> protocol, scheme
- https://github.com/chromium/chromium/blob/63b085a78684914ec6ceefca8c46fe837ad5e007/docs/special_case_urls.md
- navigate 大致解析順序
- (...HTML entity decode)(?)
- 去垃圾 tab newline, parse scheme
- url decode without leading `[scheme]:`
- element src
- 底層是呼叫 `fetch` API 解析
- 依據 agent 實作有些會 prefetch(before append element),像是 chrome svg, img
- https://fetch.spec.whatwg.org/#scheme-fetch
- 防禦
- 呼叫 `new URL` 解析後才做過濾
- common
- scheme case-insensitive
- `//`, `\\` 解析成當前域協議
- chrome 正規化實作 https://github.com/chromium/chromium/blob/4ea9fb6208b67451a55f9a2fa2eaba8fd7b61bae/url/url_util.cc#L250C6-L250C20
- 可以塞垃圾 (<=0x20) 在前後
> https://github.com/chromium/chromium/blob/0359c8efed03242bc8565845cd3d4b91d3b18f77/url/url_parse_internal.h#L47
- 可以塞垃圾 \r \n \t 到任意位置
> [ concept basic url parser
> 2.If input contains any ASCII tab or newline, invalid-URL-unit validation error.
> 3.Remove all ASCII tab or newline from input.](https://url.spec.whatwg.org/#concept-basic-url-parser)
- `location.href="\tjAva\tScript:%0a\ta\tl\te\tr\tt(1)"`
> https://github.com/chromium/chromium/blob/0359c8efed03242bc8565845cd3d4b91d3b18f77/url/url_parse_internal.h#L47
- TODO: review [basic URL parser algorithm](https://url.spec.whatwg.org/#concept-basic-url-parser)
- [`javascript:`, javascript url](https://html.spec.whatwg.org/#the-javascript:-url-special-case)
- [spec: evaluate a javascript URL](https://html.spec.whatwg.org/#evaluate-a-javascript:-url)
- [偽協議(`javascript:`)會做 URL decode](https://www.leavesongs.com/PENETRATION/xss-from-my-blog.html)
```
<a href="javascript:console.log('%26')">aaaa </a>
<!-- 先解析 HTML entity,然後偽協議的URL decode,所以結果是印出 '&' -->
```
- 如果執行結果回傳字串會複寫當前頁面,可以注入 HTML
> [6.Let newDocument be the result of evaluating a javascript: URL given targetNavigable, url, and initiatorOrigin.
> 7.If newDocument is null, then return.](https://html.spec.whatwg.org/#the-javascript:-url-special-case)
>
> [9.If evaluationStatus is a normal completion, and evaluationStatus.[[Value]] is a String, then set result to evaluationStatus.[[Value]].
> 10.Otherwise, return null.](https://html.spec.whatwg.org/#evaluate-a-javascript:-url)
- `location.href='javascript:"<svg/onload=alert(window.origin)>"'`
- 回傳非 string 會強制回傳 null
- [`data:`, data url](https://fetch.spec.whatwg.org/#data-urls)
- 第一個 `,` 會是 mimetype([mediatype]+[";base64"]), 剩下為 encodedBody,只有 encodedBody 會經過 percent decoding 變成 body
```
RFC 2397
dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype := [ type "/" subtype ] *( ";" parameter )
data := *urlchar
parameter := attribute "=" value
```
- TODO: `Let stringBody be the isomorphic decode of body.`
- [`cid:`, content-ID](https://www.ietf.org/rfc/rfc2392.txt)
- 防止 URL encode
```
<a id=aaa href='<iframe></iframe>'>
aaa.href === 'http://web/%3Ciframe%3E%3C/iframe%3E'
<a id=aaa href='cid:<iframe></iframe>'>
aaa.href === 'cid:<iframe></iframe>'
```
- DOMPurify 預設允許 `(?:f|ht)tps?|mailto|tel|callto|sms|cid|xmpp)`,其中 `ftps|tel|callto|sms|cid|xmpp` 取值都不會做 url encode
## HTML Quirk
- TODO
## execution order
- 在 document 尚未解析完前 `document.write` 會寫在下一行,解析完後是 overwrite
```
<p>a</p>
<script>
const a= ()=>(document.write('aaaa'))
a()
</script>
<p>b</p>
```
- [`<svg><svg/onload>` 在插入前就觸發](https://blog.huli.tw/2022/02/08/what-i-learned-from-dicectf-2022/#%E9%A0%90%E6%9C%9F%E5%A4%96%E8%A7%A3%E6%B3%95)
> 猜測:因為 svg 切換成 XML parsing 後,第二層的 svg 經過 tokenize 後直接觸發 tree construction
```
<div id=x></div>
<div id=y>hello</div>
<script>
x.innerHTML = '<svg><svg onload=alert(window.y.innerText)>'
y.innerText = 'updated'
</script>
<!-- 在 y.innerText 前就觸發 -->
```
# HTML Element 特性
- [HTML Parser spec](https://html.spec.whatwg.org/multipage/parsing.html)
- 分成 tokenizer 和 tree construction 兩塊,tokenizer 把餵入的文本 tokenized 後 emit token,每次 emit 後就換 tree construction 把 token 轉成 element,依據演算法插入 DOM 位置,然後影響接下去的 tokenizer 過程,兩個流程交互作用
-
## meta
- 跳轉
- `<meta http-equiv="refresh" content="5; url=https://www.fooish.com">`
- 可以跳到 blob url
- origin 會相同,CSP 繼承
- https://blog.huli.tw/2023/09/23/hitcon-seccon-ctf-2023-writeup/#canvas-4-solves
- https://bugs.chromium.org/p/chromium/issues/detail?id=933171
## comment
- open: `<!--` close:`-->` 或是 `--!>`
- [incorrectly-closed-comment](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment)
- [bogus comment state](https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state): `<?` 開頭會把遇到第一個 `>` 間的內容都當成註解
## select
- https://blog.huli.tw/2021/11/14/intigriti-xss-1021/
- select 內非法 tag 會被清掉
```
<select>
<div> jizzzz </div>
</select>
```
become
```
<select>
</select>
```
## table
- https://blog.huli.tw/2021/11/14/intigriti-xss-1021/
- https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-intable
- table 內不屬於 table 的非法元素會觸發 [foster parenting](https://html.spec.whatwg.org/multipage/parsing.html#foster-parent) ,擠到 table 之前
```
<table>
<div> jizzzz </div>
</table>
```
become-
```
<div> jizzzz </div>
<table>
</table>
```
## a
- `download`
- 指定 download filename
```
<a download=qwe> // filename=qwe.txt
<a download=pwn.exe> // filename=pwn.exe
```
- data uri + download
- 隨意創造 download 內容`<a href='data:,jizzz' download>`
- Alles CTF 2021 ALLES!Chat
- puppeteer 可以直接觸發 download
- `href`
- 當 `a` 轉換成字串時,會使用 `href` 內容
```
<a href='ji:alert(1)' id=aaa>
aaa + '' // 'ji:alert(1)'
<a href=' alert(1)' id=aaa>
aaa + '' // '$SCHEMA://$HOST/alert(1)'
```
## link
- 只有 firefox 有實作 [link header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link)?
## iframe
:::spoiler ref
- [x] https://blog.huli.tw/2022/04/07/iframe-and-window-open/
:::
- [spec](https://html.spec.whatwg.org/multipage/iframe-embed-object.html#the-iframe-element)
- cross origin 可接觸 property
> A JavaScript property name P is a cross-origin accessible window property name if it is "window", "self", "location", "close", "closed", "focus", "blur", "frames", "length", "top", "opener", "parent", "postMessage", or an array index property name.
https://html.spec.whatwg.org/multipage/browsers.html#crossoriginproperties-(-o-)
- 可以更改不同源底下任意 iframe location
- 可以偷取 postMessage 資訊
> Always specify an exact target origin, not *, when you use postMessage to send data to other windows. A malicious site can change the location of the window without your knowledge, and therefore it can intercept the data sent using postMessage.[ref](https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage#security_concerns)
- 可以偷取 cross-origin iframe window name
- 只要設成 'about:blank' 就會同源
- iframe xss leak data in `window.name` -> parent set iframe location to `about:blank` -> read `contentWindow.name` in parent
- [Bypassing CSP with dangling iframes](https://portswigger.net/research/bypassing-csp-with-dangling-iframes)
- 可以 redirect top location,非同源 chrome 會預設阻擋
- `src`
- 支援 `data:`, `javascript:`
- `data:` origin 會設成 null
- `javascript:` 要求 same origin
- `srcdoc`
- = src + data URI
- origin === parent.origin
- `csp`
- 設定 csp 屬性
- 變嚴格可以使一些 script 失效
- `sandbox`
- 將 iframe origin 預設成 null 並且關閉許多功能
- sandbox 開啟的東西會繼承 sandbox 限制
- 會把新開啟的頁面 origin 跟著變 null,有機會繞過 same origin 檢查 e.g. postMessage
- `credentialless`
- https://blog.slonser.info/posts/make-self-xss-great-again/
- self-XSS
## form
- 提交 json-like body
```
<form method='POST' enctype='plain/text'>
<input name='{"k":"v' value='"}"'>
</form>
```
- newline normalization
- https://blog.whatwg.org/newline-normalizations-in-form-submission
- form `\n` 實際上傳輸變成 `\r\n`
- 前後端長度會不一
## script
:::spoiler refs
- [x] [script type 知多少?](https://blog.huli.tw/2022/04/24/script-type/)
- [ ] [\[LINE CTF 2021\] Haribote-Secure-Note](https://gist.github.com/mdsnins/d8028c47212342ecadd9af5ec10f53f9)
:::
- 可解析的 `Content-Type`
- by [spec](https://mimesniff.spec.whatwg.org/#javascript-mime-type)
- by [implement](https://chromium.googlesource.com/chromium/src.git/+/refs/tags/103.0.5012.1/third_party/blink/renderer/core/script/script_loader.cc#184) and [implement](https://chromium.googlesource.com/chromium/src.git/+/refs/tags/103.0.5012.1/third_party/blink/renderer/core/script/script_loader.cc#247) Chromium build 103.0.5012.1
- [legacy supported](https://chromium.googlesource.com/chromium/src.git/+/refs/tags/103.0.5012.1/third_party/blink/common/mime_util/mime_util.cc) build 103.0.5012.1
:::spoiler legacy supprted list
```
"application/ecmascript",
"application/javascript",
"application/x-ecmascript",
"application/x-javascript",
"text/ecmascript",
"text/javascript",
"text/javascript1.0",
"text/javascript1.1",
"text/javascript1.2",
"text/javascript1.3",
"text/javascript1.4",
"text/javascript1.5",
"text/jscript",
"text/livescript",
"text/x-ecmascript",
"text/x-javascript",
```
:::
- `application/webbundle`: TODO
- script `type` attribute 可接受ㄉ值
- Content-Type 那些
- `webbundle`
- `module`
- `importmap`
- `speculationrules`
- [Script data double escape start state](https://html.spec.whatwg.org/multipage/parsing.html#scriptTag)
- 一般來說,`<script>` 碰到 `</script>` 就會被閉合,執行中間內容
- 在 tokenizer 階段 script data (內容)遇到 `<!--<script>` ,這個 state 會把後面遇到的 `</script>` 也當成內容的一部分,然後才尋找 `</script>` 做閉合,因此可以跨過 `script` ,另外[內容的 `<!--` 和 `<script` 要 balanced 內容才會被執行](https://html.spec.whatwg.org/multipage/parsing.html#scriptEndTag)
```
<script>
alert('exec1')
<!--<script>
/*
</script>
-->
<script>
*/
alert('exec2')
</script>
```
- 以下是實際上執行內容,`<!--` 和 `-->` 在 browser 被視為單行註釋,下面的 `-->` 是為了平衡而沒有註釋效果,被包在 `/**/` 本身已經是註釋一部份
```
alert('exec1')
<!--<script>
/*
</script>
-->
<script>
*/
alert('exec2')
```
- script under svg
- 普通的 `script` innerText 是無法使用 HTML entity,若是在 `svg` 裡則可以使用,可以繞過 WAF
-
```
<body>
<svg>
<script>
alert("works!");
</script>
</svg>
<script>
alert("not workQQ");
</script>
</body>
```
- [chromium issue](https://bugs.chromium.org/p/chromium/issues/detail?id=114641)
## svg
- svg 會把 parsing 模式轉換成 xml
- `script` 會解析 HTML entity
- `<foreignObject>` 可寫 html
```
// from huli
<foreignObject>
<iframe srcdoc="<script>alert(document.domain)</script>"></iframe>
</foreignObject>
```
## label
- 可以綁定 [`<button>, <input>, <meter>, <output>, <progress>, <select>, <textarea>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Content_categories#labelable)
- 搭配 querySelector 特性 hijack on-event
## input
- [HTML Type Override](https://docs.google.com/presentation/d/1jW0o1YO3FNXlXVkAziM_wSGQqRdLP2kmfoBb6mF1bGY/edit#slide=id.g2f056d28156_1_204)
- `<input type=image src=x onerror=alert(7122)>`
# HTML attribute
## popover
- https://portswigger.net/research/exploiting-xss-in-hidden-inputs-and-meta-tags
# markup dangling attack
> ref: maple3124
- chrome 已經阻擋
- https://chromestatus.com/feature/5735596811091968
- firefox 未阻擋
# mxss
- [Write-up of DOMPurify 2.0.0 bypass using mutation XSS](https://research.securitum.com/dompurify-bypass-using-mxss/)
- svg 讓 content model 改變
`<svg></p><style><a id="</style><img src=1 onerror=alert(1)>">`
# TODO
- https://jorianwoltjer.com/blog/p/ctf/intigriti-xss-challenge/0725