2024-04-23, 26 討論簡記

# 2024-04-23, 26 討論簡記 ## devarajabc ### 1. 資料庫搭配 [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) 可以做什麼事呢？ [PostgreSQL: bloom filter index access method](https://www.postgresql.org/docs/current/bloom.html): > A signature is a lossy representation of the indexed attribute(s), and as such is prone to reporting false positives; that is, it may be reported that an element is in the set, when it is not. So index search results must always be rechecked using the actual attribute values from the heap entry. Larger signatures reduce the odds of a false positive and thus reduce the number of useless heap visits, but of course also make the index larger and hence slower to scan. > > This type of index is most useful when a table has many attributes and queries test arbitrary combinations of them. A traditional btree index is faster than a bloom index, but it can require many btree indexes to support all possible queries where one needs only a single bloom index. Note however that bloom indexes only support equality queries, whereas btree indexes can also perform inequality and range searches. 在不用走訪全部元素的前提下，「預測」特定字串是否存於資料結構中。因此時間複雜度是 $O(1)$，而非傳統逐一搜尋的$O(n)$ ### 2. 如何將一個資料結構（如紅黑樹、linked-list）存入記憶體 http://opic.rocks/ ### 3. 發生碰撞怎麼辦？如何減少？ >此外，資料只能夠新增，而不能夠刪除，試想今天有二個字串 x1, x2 經過某個雜湊函數 hi 轉換後的結果 hi(x1) = hi(x2)，若今天要刪除 x1 而把 table 中 set 的 1 改為 0，豈不是連 x2 都受到影響？ >參考 [quiz6](https://hackmd.io/@sysprog/linux2024-quiz6)、[quiz 10 -1](https://hackmd.io/@sysprog/linux2024-quiz10) ### 4. 題目的做法是否影響亂度? ```c int bloom_test(bloom_t *filter, const void *key, size_t keylen) { uint64_t hbase = hash(key, keylen); uint32_t h1 = (hbase >> 32) % BLOOMFILTER_SIZE; uint32_t h2 = hbase % BLOOMFILTER_SIZE; return get(filter, h1) && get(filter, h2); } ``` https://research.redhat.com/blog/theses/analysis-of-randomness-levels-in-the-kernel-entropy-pool-after-boot/ ## weihsinyeh [bloom filter 數學分析](https://hackmd.io/@weihsinyeh/linux2024-homework6#bloom-filter) 亂數在 TCP 的應用: * [Why do we have to each randomly generate a number and add one in the protocol handshake of TCP?](https://www.quora.com/Why-do-we-have-to-each-randomly-generate-a-number-and-add-one-in-the-protocol-handshake-of-TCP) * [TLS 握手中發生什麼事情？](https://www.cloudflare.com/zh-tw/learning/ssl/what-happens-in-a-tls-handshake/) ![image](https://hackmd.io/_uploads/Sk2Ww45bA.png =85%x) TCP 的安全性不會因為 TLS 而提升，因為 TLS 不涵蓋對 TCP 標頭的保護。真正能夠增強 TCP 安全的是 [TCP 安全選項](https://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml) 19 (已棄置) 和選項 29。傳統的 TCP 較為脆弱，無法精準鑑別接收到的 TCP 資料是否由真實的源頭主機傳送，或是已被偽造。因此，為了增加偽造的難度，TCP 採用了隨機序列號。 TCP 是種有狀態的通訊協定，當 A 與 B 之間存在一個 TCP 連線時，若 C 想要進行破壞，它可能會嘗試偽造 A 的 IP 向 B 發送一個 reset 命令。這種情況可區分為以下： 1. 若 C 位於 A 與 B 通訊路徑上，它可輕鬆捕獲到雙方的 IP 封包並進行偽造，這類似於某些大型防火牆使用 Reset 來斷開用戶連線的做法。 2. 若 C 不在 A 與 B 的通訊路徑上，它可在世界任何角落偽造一個符合 TCP 規範的資料，關鍵在於資料中的序列號 (sequence number) 和確認號(acknowledged number) 必須落在接收方的滑動窗口 (sliding window) 內，這樣對方才會認為是合法的，並可能導致對 A 與 B 之間的 TCP 連線進行 Reset。為了提高安全性，TCP 握手過程採用隨機（實際上是隨著時間線性增長，到達 $2^{32}$ 則折回到零）的序列號，這使得攻擊者難以預測，偽造的序列號若不在合法範圍內，就會被接收方丟棄。 https://github.com/microsoft/mimalloc ## You! httpd (測驗三) 會建立幾個執行緒？ big.LITTLE Scheduling domain hierarchy 為何 HTTP 要指定 content-type ? HTTP 1.x is stateless 如何降低 get_type() 的分支數量？ -> [perfect hash function](https://www.gnu.org/software/gperf/manual/gperf.html) e.g., https://github.com/naver/webkit/blob/master/Source/WebCore/platform/ColorData.gperf SIGTERM / SIGINT (terminate process): Ctrl-C kill -15 ## HenryChaing FAQ #1: 如何 trace code? debug ? A: 需要知道框架 Linux kernel module 要怎麼 debug，最近 debug 都需要重開機。 **Greg Kroah-Hartman** linux debugging > 記憶體存取、 kfifo(狀態) > 先從 hello world 、 workqueue 開始 > 不要拼裝程式 >kernel oops! > [測試 Linux 核心的虛擬化環境](https://hackmd.io/@sysprog/linux-virtme) > [建構 User-Mode Linux 的實驗環境](https://hackmd.io/@sysprog/user-mode-linux-env) 我有看 [lkmpg](https://sysprog21.github.io/lkmpg/)。 :::info 感謝回答！！目前想到的 depmod :第一次掛載模組時，要先用 depmod 命令，才能掛載新模組 sudo insmod xxx.ko 掛載kernel模組 sudo rmmod xxx.ko 卸載kernel模組 dmesg 查看模組掛載狀態 lsmod 查看掛載那些模組 ::: ## lumynou5 檔案伺服器要根據檔案類型提供對應的 `Content-Type` header，但如何判斷檔案類型？ REST /get/1 JSON /set/1 JSON 利用副檔名判斷，如 `get_type()` 函式，會有以下問題： - 副檔名並不是必需的。 - 副檔名有重複，例如 TypeScript 原始碼和 MPEG2-TS 影片封裝格式副檔名皆為 `.ts`。部分格式有規範檔案開頭幾個位元組是什麼，但是並非所有檔案都如此，尤其純文字檔案如 CSS。 https://github.com/jserv/facebooc AJAX: https://developer.mozilla.org/zh-TW/docs/Web/API/XMLHttpRequest > ```c > char *mimeType = "text/plain"; > > len = bsGetLen(req->uri); > > if (!strncmp(req->uri + len - 4, "html", 4)) > mimeType = "text/html"; > else if (!strncmp(req->uri + len - 4, "json", 4)) > mimeType = "application/json"; > else if (!strncmp(req->uri + len - 4, "jpeg", 4)) > mimeType = "image/jpeg"; > else if (!strncmp(req->uri + len - 3, "jpg", 3)) > mimeType = "image/jpeg"; > else if (!strncmp(req->uri + len - 3, "gif", 3)) > mimeType = "image/gif"; > else if (!strncmp(req->uri + len - 3, "png", 3)) > mimeType = "image/png"; > else if (!strncmp(req->uri + len - 3, "css", 3)) > mimeType = "text/css"; > else if (!strncmp(req->uri + len - 2, "js", 2)) > mimeType = "application/javascript"; > ``` > （`src/server.c:143`） > facebooc 的實作也是利用副檔名？ ## vax-r * Linux Kernel 為何要將 tasklet 逐漸移除？ CMWQ 雖然有更大彈性，但不保證任務能在同一個 CPU 上完成，需要透過額外的 memory barrier 等機制才能做到， tasklet 相較就可以簡單做到，為何不保留？ > LWN 有寫一篇關於這件事的討論：https://lwn.net/Articles/960041/ https://www.slideshare.net/jserv/realtime-linux (threaded IRQ) 現在 Linux kernel 不支援 nested interrupt，以前可以問題: 請問linux 不支援巢狀中斷，那LINUX OS遇到同時兩個硬體中斷進來時，是怎麼處裡?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.