How browsers work

# How browsers work ###### tags: `broswer` ref: https://web.dev/howbrowserswork/ > As a web developer, learning the internals of browser operations helps you make better decisions and know the justifications behind development best practices. While this is a rather lengthy document, we recommend you spend some time digging in; we guarantee you'll be glad you did. > >Paul Irish, Chrome Developer Relations # [The browser's main functionality ＃](https://web.dev/howbrowserswork/#the-browsers-main-functionality) 網頁一般的 UI 介面會有： 1. 搜尋列來搜尋 URI。 2. 上一頁、下一頁功能鍵。 3. 書籤列功能。 4. 重整網頁按鈕用來重整或停止繼續載入當前的 documents。 5. 回到首頁功能。稍微介紹一下 URI : * URI（Uniform Resource Identifier）：就是標記網路上的資源，其中又包含 URN 和 URL。 * URN（Uniform Resource Name）：給名字。 * URL（Uniform Resource Locator）：給地址。 # [The browser's high level structure #](https://web.dev/howbrowserswork/#the-browsers-high-level-structure) 瀏覽器的基本元件: 1. The user interface：基本上就是上面的五項。 2. The browser engine：安排動作在渲染引擎以及 UI 之間。 3. The rendering engine：主要呈現使用者要求的內容到網頁上，舉例來說，一般要求 HTML 這時候它會就解析 HTML 以及 CSS ，並且結果呈現到螢幕上。 4. Networking：呼叫 http 請求，使用不同的執行在不同的平台上基於一個高相容性的介面。 5. UI backend：用來畫出一些小的功能元件，像是 combo boxes 或是視窗，這項 backend 透露了其原生的介面並不會只能使用在特定的平台，在底端使用了 operating system user interface methods 。 ![](https://i.imgur.com/uvvTdO6.png) 6. JavaScript interpreter：用來解析以及執行 JS 7. Data storage：瀏覽器必須儲存各種資料像是 cookies。並且支援各種機制像是 localStorage, [IndexedDB](https://www.yasssssblog.com/2020/08/19/web-indexeddb/), WebSQL and FileSystem. ![](https://i.imgur.com/5z6xyfP.png) # [The main flow #](https://web.dev/howbrowserswork/#rendering-engines) ![](https://i.imgur.com/VXLqIHd.png) 1. 解析 HTML 轉換成 DOM nodes 稱為 content tree 2. DOM 結合 CSSOM = render tree 3. 接下來進入 layout 階段，確認所有 node 在螢幕上的座標 4. 接下來進入 painting 階段，會遍歷 render tree 這時每個 node 會被使用 UI backend layer 來畫上螢幕 **這邊重要的點是**：為了使用者體驗 rendering engine 會盡快呈現內容到螢幕上。整個 flow 不會等待整個 HTML 被解析完成才進行整個 build 以及 layout render tree。部分的內容會被解析並且呈現到螢幕上當其他內容甚至還在從網路載下來的路上。 # [Main flow examples #](https://web.dev/howbrowserswork/#main-flow-examples) WebKit main flow: ![](https://i.imgur.com/PIv40mR.png) Mozilla main flow: ![](https://i.imgur.com/CIzW9B4.png) 基本上流程是一樣的就是一些名詞上面的不同，但有一個地方特別不一樣就是 Mozilla 多了一個 Content Sink 在 HTML parse 之後，文章後面會有更多解釋針對每個階段。 ## [Parsing - general #](https://web.dev/howbrowserswork/#parsing-general) For example, parsing the expression 2 + 3 - 1 could return this tree: ![](https://i.imgur.com/Memop8I.png) ## [Parser - Lexer combination #](https://web.dev/howbrowserswork/#parser-lexer-combination) Parsing can be separated into two sub processes: **lexical analysis** and **syntax analysis**. * lexical analysis：會把 input 拆解成 tokens，也就是以人類語言來說就是字典內的字 * syntax analysis：則是應用 sytax rules Parsers 會拆解成兩個步 components ： 1. lexer(也可稱 tokenizer) 負責拆解 input 變成合法的 tokens lexer 會處理不相關的字元像是空白或換行 3. parser 根據 syntax rules 分析 document 結構，用以建構 parser tree ![](https://i.imgur.com/3KSC14U.png) 整個 parsing 是可以迭代的。parser 會不斷要求 lexer 新的 token 並且嘗試去對接 syntax rules。 * 如果符合則在相對應的 node 處新增此 token 到 parse tree 並且繼續要求下一個 token * 如果不符合，則會先把此 token 儲存在內部，並持續要求其他的 tokens 直到有 rules 符合所有已經儲存的 token ，如果都沒有找到則會提出例外。代表 document 不合法並且包含錯誤的語法。 ## [Translation #](https://web.dev/howbrowserswork/#translation) ![](https://i.imgur.com/I678ZzP.png) 在很多情況下， parsing tree 並不是最終的目的地。 > Parsing is often used in translation: transforming the input document to another format. 在 translation 來說 parsing 的用處是轉換 input document 成另一種格式，舉例來說： compiler 在編譯 source code 成 machine code 的過程 parsing into parse tree => translation => machine code ## [Parsing example #](https://web.dev/howbrowserswork/#parsing-example) 這邊會做一個示範 parsing 的過程語法： 1. 語言的語法區塊有 expressions, terms and operators 2. 語言可以包括任何數字的 expressions 3. expressions 可以被定義為一個 term 後面接著 operator 後面再接著 term 4. operator 可以是加減號 5. term 是一個整數 token 或 expression 確認過語法後，我們來解釋 input 2 + 3 - 1 的過程吧！ * 首先 2 根據第五條是一個 term * 接續的 2 + 3 符合第三條 * 接下來 2 + 3 - 1 符合第五條是一個 expression 從這裡就知道如果 2 + + 就是一個不合理的 input。 ## [Formal definitions for vocabulary and syntax #](https://web.dev/howbrowserswork/#formal-definitions-for-vocabulary-and-syntax) 正式的詞彙定義是藉由 [regular expressions](https://www.regular-expressions.info/)來定義的。像是剛剛 2 + 3 - 1 的法語言就會這樣定義： ```regular expression INTEGER: 0|[1-9][0-9]* PLUS: + MINUS: - ``` 語法常會使用這個 [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form)方法定義： ```regular expression expression := term operation term operation := PLUS | MINUS term := INTEGER | expression ``` ## [Types of parsers #](https://web.dev/howbrowserswork/#types-of-parsers) 有兩種 parsers： * top down parsers 檢視 high level structure of the syntax 並且試著去找 rule 去核對。一開始會 identify 2 + 3 為 expression ，接著 identify 2 + 3 - 1 為 expression。（ top down 的 parser identify 的過程是會演進的，持續比對不同的 rules 但他的出發點會是 high level rules） * bottom up parsers 從 input 去檢視並且逐漸轉換成 syntax rules 也就是帶入 syntax rules 從 low level 一直到 high level 也符合 rules 為止。 ![](https://i.imgur.com/6lZru0b.png) 這種方式也被稱為 shift-reduce parser 他會漸漸的 reduced to syntax rules。 ## [Generating parsers automatically #](https://web.dev/howbrowserswork/#generating-parsers-automatically) WebKit 作為一個開源的Web瀏覽器引擎，使用了兩種知名的 parser generators: [Flex](https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator)) containing regular expression definitions of the tokens [Bison](https://www.gnu.org/software/bison/) is the language syntax rules in BNF format. # HTML Parser# 工作在於 parse HTML markup into a parse tree. ## The HTML grammar definition # 基本上參考 W3C 定義的 grammer ## Not a context free grammar # HTML 並不是 context free grammar 並且跟 XHTML 或是 XML 不能使用相同的 parser！ HTML 的特性是 forgiving: 代表可以忽略特定的 tags 或是有時可以忽略開頭或是結尾 tags 等等，對比 XML 是個比較規定死的語法，HTML 可以被說是更有彈性的語法。 ## [HTML DTD #](https://web.dev/howbrowserswork/#generating-parsers-automatically) HTML definition is in a DTD format (Document Type Definition) ## [DOM#](https://web.dev/howbrowserswork/#dom) The output tree (the "parse tree") 就是 DOM 元素以及屬性 nodes 。 ![](https://i.imgur.com/5tWf15Q.png) 跟 HTML 一樣 DOM 也被 W3C 組織規範。 * For DOM www.w3.org/DOM/DOMTR * For HTML www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/idl-definitions.html. ## [The parsing algorithm #](https://web.dev/howbrowserswork/#the-parsing-algorithm) HTML parsing 是不能使用上面提到的兩種 parser 。因為： 1. 語法 forgiving 的特性 2. 瀏覽器傳統上會去容忍知名的 HTML invalid 的特例 3. parsing process 會重新進入，意思是在其他語言中 source 並不會在 parsing 階段還作出修改，但是 HTMl 是動態的扣（像是 script 內包含了 document.write() 呼叫)可以新增額外的 tokens，因此會在 parsing process 的階段還在修改 inputs 。因為這些原因所以瀏覽器創造了專門 parse HTML 的 parser。詳細的演算法內容可以看這邊： https://html.spec.whatwg.org/multipage/parsing.html 不過它會分為兩個步驟： * Tokenization 其實就是 lexical analysis 把 input 轉變成 tokens ，不過在 HTML 中 tokens 就是 start, end tags attribute names, attribute values。 * construction tokenizer 會辨識出這些 token 並且把他們送進 constructor tree ，接著繼續辨識下一個 token 直到轉化完成整個 input 。 ![](https://i.imgur.com/Ri7SxvU.png) ## [The tokenization algorithm #](https://web.dev/howbrowserswork/#the-tokenization-algorithm) 這個演算法的 output 是 HTML tag 。非常複雜這邊只做簡介 XD，內文提到這個演算法是一個 state machine ，每一個 input 的字元都會代表一個 state 並且會因為下一個字元而更新 state 。這邊就代表同樣的字元會產出不同的結果只要下一個 state 做了修正。簡單範例: ```htmlembedded= <html> <body> Hello world </body> </html> ``` * 初始 state 為 Date state * 從 < 字元開始，當前的 state 轉為 Tag open state * 接下來會使用到 a-z 字元呈現它的名字，因此使用 tag name state * 一直到 > close tag 出現會變成 Tag close state。 * 並且這些使用到的字元都會嵌入新的 token 中，也就是範例中的 `<html>` tag。接下來會碰到 `html` 的 > close tag，這時 state 會再次轉變為 Data state 。接下來的 body tag 會重複一樣的動作。又回到 Data state 接下來觸發 Hello 的 H 並且持續發出字元 token 持續到 `</body>` 為止，會把整個 Hellow World 都完整發出去。接著碰到 < 會轉換成 tag open state ，接下來使用 / 來觸發 end tag token 並且轉變成 tag name state ，並一直使用到 > ，再來回復到 Data state ，接下來的 `</html>` 會重複一樣的動作。 ![](https://i.imgur.com/HMKD1Eg.png) ## [Tree construction algorithm #](https://web.dev/howbrowserswork/#tree-construction-algorithm) 當 parser 跟 DOM 都建立好，進入到 tree construction 階段這時 DOM tree 與 document root 將會被修改以及會有元素被加進去。每一個 node 會被 tokenizer 發出並且被 tree constructor 處理，針對每一個 token (也就是 html tag) 的規格都會定義是跟哪一個 DOM 元素相關，並且創造相關的 token。剛剛創造的 token 會被放入 DOM tree 中以及加入 open stack 中，這個 open stack 的內容包含修正 nesting mismatches 以及尚未關閉的 tags。這個演算法也是 state machine ，其 state 稱為 insertion modes。繼續使用上方的 Hello World 範例： ```htmlembedded= <html> <body> Hello world </body> </html> ``` tree construction state 的 input 是一連串從 tokenization 來的 token 組成。最初始的 mode 為 initail mode 接收到 html token 會觸發 before html mode 並且會重新處理這個 token ，並且創造 HTMLHtmlElement element 然後嵌入 root Document 。 state 接著轉變為 before head mode 這時候會接收到 body tag 接著 HTMLHeadElement 會被創造並且嵌入 root Document ，儘管這邊並沒有使用到 head 這個 tag 。接著轉變為 in head state mode 然後是 after head mode，這時 body token 會被重新處理， HTMLBodyElement 會被創造並嵌入並且這時候的 state 為 in body。這時接收到 Hello world 字串，這邊會創造一個 Text node 嵌入。接下來就是 after body mode 登場，也就是 end of body tag ，並緊接著 html end tag ，此時轉換成 after after body mode ，最後接收到 > 結束這個 file parsing 。 ![](https://i.imgur.com/hytrQA1.png) ## [Actions when the parsing is finished #](https://web.dev/howbrowserswork/#actions-when-the-parsing-is-finished) 這個階段會把 document 標示為可互動的，並且開始解析處在 deferred mode 的 scripts，再之後 document 的 state 會轉變為 complete 並且 load event 會被啟用。要參考 full algorithms for tokenization 以及 tree construction 可以參考 HTML5 specification https://html.spec.whatwg.org/multipage/parsing.html#html-parser ## [Browsers' error tolerance #](https://web.dev/howbrowserswork/#browsers-error-tolerance) We have to take care of at least the following error conditions: 1. 某些 tag 被明確禁止去用在某些 outer tag 內，在這種情況下會關閉所有的 tag 直到被禁止的那個 tag 為止，才把它加回去。 2. 某些 tag 是不能直接使用在 document 內的，但常常會發生忘記的情況，例如 HTML HEAD BODY TBODY TR TD LI （是不是忘了什麼？ 3. 想要把一個 block 元素加進去 inline 元素。這時會關閉所有 inline 元素，一直到下一個可以容納它的的 block 元素為止。 4. 如果上述都沒有幫助，則關閉元素直到可以新增元素為止，或直接忽略這個 tag ### WebKit 的 br 這邊使用斷行 tag `<br>` 來做例子，在某些網站必須使用 `</br>` 來操作有些則用 `<br>` 即可，在這種情況下， WebKit 把他們一視同仁為 `<br>` 。 ```code= if (t->isCloseTag(brTag) && m_document->inCompatMode()) { reportError(MalformedBRError);// 這邊報錯是內部的，使用者看不到。 t->beginTag = true; // 從這行可以看出就算是 closeTag 的 br 也會強制轉成 beginTag 的狀態，達到一視同仁的效果。 } ``` ### A stray table # stray table 的意思是一個 table 鑲嵌在一個 table 內。 ```htmlembedded= <table> <table> <tr><td>inner table</td></tr> </table> <tr><td>outer table</td></tr> </table> ``` WebKit 會直接轉換成兩個相鄰的 table： ```htmlembedded= <table> <tr><td>outer table</td></tr> </table> <table> <tr><td>inner table</td></tr> </table> ``` 處理這邊的扣： ```code= if (m_inStrayTableContent && localName == tableTag) popBlock(tableTag); ``` ### Nested form elements # 這邊為了避免 form 巢狀在另一個 form 底下，內部的 form 會被直接忽略。 ```code= if (!m_currentFormElement) { m_currentFormElement = new HTMLFormElement(formTag, m_document); } // 直接重新創造 form tag 清空內部 form tag 。 ``` ### A too deep tag hierarchy # 簡單來說就是巢狀太深，文章說明最多只接受 20 層相同型別的巢狀。 ```code= bool HTMLParser::allowNestedRedundantTag(const AtomicString& tagName) { unsigned i = 0; for (HTMLStackElem* curr = m_blockStack; i < cMaxRedundantTagDepth && curr && curr->tagName == tagName; curr = curr->next, i++) { } return i != cMaxRedundantTagDepth; } // 看不懂 XD ``` ### Misplaced html or body end tags # 這邊很酷的是，文章中說他們從未關閉 body tag 因為有些網頁會把他們關閉的太早（關錯邊），所以其實是仰賴一個 end() func 來關閉 body tag 。 ```code= if (t->tagName == htmlTag || t->tagName == bodyTag ) return; ``` # [CSS parsing #](https://web.dev/howbrowserswork/#css-parsing)