--- type: slide --- ## console.log("🤦🏼‍♂️".length) - 2024.03.05 / bob chang - [2023 年每个软件开发者都必须知道的关于 Unicode 的最基本的知识(仍然不准找借口!)](https://blog.xinshijiededa.men/unicode/#user-content-fnref-touche) --- ## agenda 1. 介紹 Unicode 2. 介紹 UTF-8 3. What’s "🤦🏼‍♂️".length? 4. "Å" !== "Å" !== "Å"? ![Screenshot 2024-03-05 at 05.29.09](https://hackmd.io/_uploads/B1XNpnXpa.png) --- ### Unicode == character ⟷ code point. * The Latin letter `A` is assigned the number `65`. * The Arabic Letter Seen `س` is `1587`. * The Katakana Letter Tu `ツ` is `12484` * The Musical Symbol G Clef `𝄞` is `119070`. * `💩` is `128169` (`0x1F4A9`). --- - the largest is 0x10FFFF. 全部約 110 萬個空間 - https://symbl.cc/en/unicode/table/ ![Screenshot 2024-03-05 at 03.16.01](https://hackmd.io/_uploads/S1AeAcXaa.png) --- ### What’s Private Use? - 这些码位是为应用程序开发人员保留的,Unicode 自己永远不会定义它们 - 例如,Unicode 中没有苹果 logo 的位置,因此 Apple 将其放在私用区块中的 U+F8FF - 在任何其他系統中,它都将呈现为缺失的字形 `􀣺`,但在 macOS 附带的字体中,你就可以看到  - `option`+`shift`+`k` =  - `control` +`command`+`space` → 叫出符號視窗 --- ### UTF-8 is a variable-length encoding - A code point might be encoded as a sequence of one to four bytes. | | code | UTF-8 | UTF-16 | UTF-32 | |---|---------|---------|---|---| |`A`| `0041` | `41` | `0041` | `00000041` | |`樂`| `6A02` | `E6A882` | `6A02` | `00006A02` | |`💩`| `1F4A9` | `F09F92A9` | `D83DDCA9` | `0001F4A9` | --- ### What’s "🤦🏼‍♂️".length? ![Screenshot 2024-03-05 at 05.34.51](https://hackmd.io/_uploads/rk_K03XTa.png) - 🤦🏼‍♂️ → U+1F926 U+1F3FB U+200D U+2642 U+FE0F - https://hsivonen.fi/string-length/ --- # "Å" !== "Å" !== "Å"? ![Screenshot 2024-03-05 at 02.34.40](https://hackmd.io/_uploads/rJiHVc766.png) --- ![image](https://hackmd.io/_uploads/r11FEcQTT.png) ![Screenshot 2024-03-05 at 02.40.59](https://hackmd.io/_uploads/HJSpS5Xa6.png) --- ### Before comparing strings, normalize! <!-- https://stackoverflow.com/a/63013732 --> ![Screenshot 2024-03-05 at 02.49.34](https://hackmd.io/_uploads/SytTv57a6.png) --- ### Why does `toLowerCase()` accepts Locale as an argument? ``` var en_US = Locale.of("en", "US"); var tr = Locale.of("tr"); "I".toLowerCase(en_US); // => "i" "I".toLowerCase(tr); // => "ı" "i".toUpperCase(en_US); // => "I" "i".toUpperCase(tr); // => "İ"' ``` --- ### 規則每年都在變 - Unicode 每年 release 新的版本 - 這是為什麼我們每年都有新的 emoji 可以用 ![image](https://hackmd.io/_uploads/Hy5tJTmp6.png) --- # 🧋🧋🧋 - 算字數,要注意 ⚠️ - 字串比對,要注意 ⚠️ --- - [歡迎來到字嗨!](https://zi-hi.com) - [Taiwan Emoji Project](https://www.shinylee.com/project/taiwan-emoji-project) - [emojikitchen](https://emojikitchen.dev) ![image](https://hackmd.io/_uploads/SJdePaXap.png)