議程2 - Mastering Large-Size HTTP Requests in the Modern Web - Cherie Hsieh

# 議程2 - Mastering Large-Size HTTP Requests in the Modern Web - Cherie Hsieh ###### tags: `GopherDay2024` `Agenda` {%hackmd /@Golang-Taipei/GopherDay2024-info %} ### [Slido 連結](https://app.sli.do/event/jFegzjjgqqKYVeLJV5jjyD) ### [投影片連結](https://github.com/YuShuanHsieh/slides/blob/main/Cherie_Gopherday_2024.pdf?fbclid=IwZXh0bgNhZW0CMTAAAR29GqNmF_Ku0Zzl2iuoNxZLf7VwPmiYR5UOSHiFmI6M-05nQ7EGs6mlOZY_aem_AQzGZHQYlug2hlloWJmmePo3EJWaI2_ltk1yHem__6nCsQzj5_p7Nlq9FjOBUHH7FZYCaiHV4nz60eL4dUakrdYV) Multipart File Upload 情境只考慮 HTTP 1.1 如果要上傳 1~2 GB 就會開始有問題上傳檔案時，先把所有資料拆成 TCP packet 送到 server，server 再組回來 ```mermaid graph LR subgraph Client direction TB file[File] --> clientKernel[Client Kernel] clientKernel --> clientSocket[TCP Socket] end clientSocket -- TCP Connection --> serverSocket subgraph Server direction TB serverSocket[TCP Socket] --> serverKernel[Server Kernel] serverKernel --> application[Application] application --> handler[Handler] end %% Adding a comment to represent Single HTTP Request file -.->|Single HTTP Request| clientKernel ``` 大檔案的挑戰：memory challenge, network latency, I/O block when parsing the file - 讀跟寫檔案的時候都會花很多 memory - 把 TCP packet 組回來的時候會 block 住、對 CPU loading 大 (cpu bound task) Q1. Golang built in http package 如何處理 form data? 用 `r.ParseMultipartForm` 來 parse 資料、`FormFile` 讀裡面某個 key 的檔案 Parse 階段不會把 body 全部讀出來，只有先讀 header 要儘早把資料讀出來，避免他的 buffer 被清掉(?)/被佔滿導致後續封包進不來 Q2. 若檔案 size 過大會有什麼問題？(maxMemory Default: 32MB)，預設情況下，golang 會將檔案存在 memory；當檔案過大時 Golang 會將接收的檔案存成檔案 `ParseMultipartForm` 會帶入 maxMemory 參數 ![image](https://hackmd.io/_uploads/rynXsp0X0.png) Risks - Too many requests concurrently - Complex struture in form data - 用 `io.ReadAll()` 讀出所有內容可能會讓記憶體用量過高 - Disk write 速度慢，若寫入到 /tmp 為 in memory 也會導致 memory 佔用高（或 allocate 不夠） ### Chunked Request [`Transfer-Encoing: chunked`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding) Content length header 可以不傳，把大檔案切成多個 request Golang HTTP package: body reader 會自動切換成用 `ChunkedReader` ```mermaid graph LR subgraph Client direction TB file[File] --> clientKernel[Client Kernel] clientKernel --> clientSocket[TCP Socket] end clientSocket -- TCP Connection --> serverSocket subgraph Server direction TB serverSocket[TCP Socket] --> serverKernel[Server Kernel] serverKernel --> application[Application] application --> handler[Handler] end %% Adding comments to represent Multiple HTTP Requests (Transfer-Encoding: chunked) file -.->|HTTP Request Chunk 1| clientKernel file -.->|HTTP Request Chunk 2| clientKernel file -.->|HTTP Request Chunk 3| clientKernel ``` ### Performance Optimization - 用 `io.Copy` 取代 `io.ReadAll` + write file，避免全部讀到 memory。`io.Copy` 預設 buffer size 是 32 KB - [io.LimitReader](https://pkg.go.dev/io#LimitedReader) 限縮記憶體使用 - 改寫 `ParseMultipartForm()` 讓他可以直接寫到一個指定的檔案路徑，避免先寫到 temp fs 再複製檔案 - 如果檔案太大會一直遇到 slice extend -> 可以預先 allocate memory - io.Copy 時 slices extension(append) 會導致在記憶體內搬來搬去(reallocate)，可以一開始宣告好 buffer 的 capacity。 - Security issue: https://nvd.nist.gov/vuln/detail/CVE-2023-45290 ## QA 1. 斷線續傳 server handler 需要做額外處理 e.g. staus 206 2. 如何定義檔案巨大、如何做 benchmark 針對情況決定實體/虛擬機的 memory 需要 allocate 多少，以及 buffer size 需要設多少 3. 想問 chunkedReader 具體會怎麼處理 buffer 原理相同， chunkedReader 最主要是在 handler call request.body.read 的時候 parse chunked request 會有的 syntax (e.g. chunked size)，而沒有 call request.body.read 之前，client 送過來的 packet 還是先放在 socket buffer 中。至於讀取的時候會使用到多少 buffer，取決於你怎麼實作。

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.