The Art of Readable code

# The Art of Readable code 2023/12/29 @Synology MIT 新人讀書會 <style> p, li { font-size: 0.8em; } </style> --- ## 簡介 ![image](https://hackmd.io/_uploads/SyJw2KTLa.png) ---- 為什麼要講這本書 - 很薄（不到 200 頁），在等 code build 時翻一翻很快可以翻完 - 比 Clean Code 簡單易懂實用 - 明明很簡單，但不知道為什麼大部分工程師沒有落實（或者不知道） ---- 我如何挑選章節與重點 - 我覺得有趣的 - 我覺得大部分人很容易忽略的 - 不講非常基本的，例如程式碼區塊、換行、縮排、一致性 --- ## 可讀性基本定理 (Ch 1) 撰寫程式時應該將讀者理解所需的時間降到最短讀者除了可能是同事以外，也有可能是幾個月後的自己 --- ## 命名 (Ch 2) ---- ### 挑選富含資訊的名稱 ---- ```python= def get_page(url): ... ``` 從哪裡拿？網路？快取？ ---- ```cpp= class BinaryTree { int Size(); ... } ``` Size 是什麼？結點數？深度？佔用的記憶體空間？ ---- ### 應該避免的名稱 - `tmp` - `ret`, `retval`: 除了表示「我是個回傳值」以外，沒有任何其他資訊應該要選個能夠說明實體數值或目的的名稱，不要「懶得想個名字」 ---- 適合用 tmp 的時機 ```cpp= if (rhs < lhs) { tmp = rhs; rhs = lhs; lhs = tmp; } ``` ---- 不適合用 tmp 的時機（懶得想） ```cpp= string tmp = user.name(); tmp += " " + user.phone_number(); tmp += " " + user.email(); ``` ---- ### 在名稱中加入額外資訊 - `start` vs `start_ms` - `password` vs `plaintext_password` ---- #### 匈牙利命名法 - pointer: `last` vs `pLast` - string: `username` vs `strUsername` (有額外資訊嗎) https://zh.wikipedia.org/zh-tw/%E5%8C%88%E7%89%99%E5%88%A9%E5%91%BD%E5%90%8D%E6%B3%95#%E5%8C%88%E7%89%99%E5%88%A9%E7%B3%BB%E7%BB%9F%E5%91%BD%E5%90%8D%E6%B3%95%E7%9A%84%E7%BC%BA%E7%82%B9 --- ## Ch 3~6 在講怎麼寫註解、程式碼編排等等。 - 編排要讀起來舒服 - 沒讓程式碼更好讀的註解不要加 - 盡量用好的命名取代註解 ---- 沒讓程式碼更好讀的註解 ```cpp= enum ChannelType { // public channel kChannelPublic = 0, // private channel kChannelPrivate = 1, ... } ``` ---- 有用的註解 ```cpp= // the following numbers are used in database enum ChannelType { kChannelPublic = 0, kChannelPrivate = 1, kChannelAnonymous = 2, // group chat or direct message kChannelSynobot = 3, // Hidden channel is used by office integration // It does not mean that user hides the channel // User hide channel should be seen last_hide_at > last_view_at kChannelHidden = 4, kChannelChatbot = 5 }; ``` --- ## 提高控制流程可讀性 (Ch 7) ---- ### 條件式中的條件順序 ```cpp= if (length >= 10) { ... } ``` vs ```cpp= if (10 <= length) { ... } ``` ---- ```cpp= while (bytes_received < bytes_expected) { ... } ``` vs ```cpp= while (bytes_expected > bytes_received) { ... } ``` ---- 為什麼（對於大多數人）第一種寫法比較好讀？ ---- - 左側：比較對象（會變動的數值） - 右側：比較基準（固定的常數）這樣較符合我們日常口語用法 ---- 舉個例子 - 這禮拜解的 bug 數量超過 10 個 - 10 小於等於這禮拜解的 bug 數量 ---- #### [尤達條件式](https://zh.wikipedia.org/zh-tw/%E5%B0%A4%E9%81%94%E6%A2%9D%E4%BB%B6%E5%BC%8F) ![image](https://hackmd.io/_uploads/B1l8eHqp8T.png) ---- ### 減少巢狀結構如何減少下面這段 code 的巢狀結構？ ```cpp= if (user_result == SUCCESS) { if (permission_result != SUCCESS) { reply.WriteErrors("error reading permissions"); reply.Done(); return; } reply.WriteErrors(""); } else { reply.WriteErrors(user_result); } reply.Done(); ``` ---- - 多使用 early return - 先處理失敗狀況 ```cpp= if (user_result != SUCCESS) { reply.WriteError(user_result); reply.Done(); return; } if (permission_result != SUCCESS) { reply.WriteError("error reading permissions"); reply.Done(); return; } reply.WriteError(""); reply.Done(); ``` --- ## 分解巨大表示式 (Ch 8) 將巨大表示式分解為更容易消化的大小 ---- ### 解釋性變數 ```python= if line.split(":")[0].strip() == "root": ... ``` vs ```python= username = line.split(":")[0].strip() if username == "root": ... ``` 加入了額外變數，雖然程式碼變長了但是更好懂。 ---- ### 摘要變數 ```cpp= if (request.user.id == document.owner_id) { ... } ... if (request.user.id != document.owner_id) { ... } ``` `request.user.id == document.owner_id` 雖然不算大，但是包含了五個變數，需要花點時間理解。 ---- 改寫後 ```cpp= const bool user_owns_document = request.user.id == document.owner_id; if (user_owns_document) { ... } ... if (!user_owns_document) { ... } ``` 1. `if (user_owns_document)` 比較容易理解 2. `user_owns_document` 是個獨立的概念，且會在區塊內一再引用 ---- ### De Morgan's law 1. `!(a && b) <=> !a || !b` 2. `!(a || b) <=> !a && !b` ---- 使用 De Morgan's law 簡化以下表示式： ```cpp= !(file_exists && !is_protected) ``` ---- 改寫後 ```cpp= !file_exists || is_protected ``` --- ## 變數與可讀性濫用變數造成的三個問題 - 變數越多，越難同時記住所有變數 - Variable scope 越大，就必須記得越久 - 變數越常改變，越難記住目前的數值 ---- ### 消除變數 ---- #### 不必要的暫存變數 ```python= now = datetime.datetime.now() root_message.last_view_time = now ``` vs ```python= root_message.last_view_time = datetime.datetime.now() ``` ---- 為什麼 `now` 不必要？ 1. 不是分解複雜表示式的結果 2. `datetime.datetime.now()` 已經夠清楚了 3. 只使用一次，沒消除任何重複程式碼 ---- #### 消除中間結果如何簡化以下程式碼？ ```javascript= var remove_one = function (array, value_to_remove) { var index_to_remove = null; for (var i = 0; i < array.length; i++) { if (array[i] === value_to_remove) { index_to_remove = i; break; } } if (index_to_remove !== null) { array.splice(index_to_remove, 1); } } ``` ---- `index_to_remove` 可以直接砍了，化簡後結果如下 ```javascript= var remove_one = function (array, value_to_remove) { for (var i = 0; i < array.length; i++) { if (array[i] === value_to_remove) { array.splice(index_to_remove, 1); return; } } } ``` ---- ### 你想讓同事覺得一直在面試嗎？ > 微軟的 Eric Brechner 提過為什麼好的面試問題必須至少包含三個變數，也許是因為同時處理三個變數會讓人必須認真思考！就面試而言有道理，必須找出受試者的能力。但你會希望同事看到自己所寫的程式碼時，有和面試一樣的感受嗎？ ---- ### 限縮變數的範圍盡可能減少可以看到變數的程式碼行數 - 不要使用全域變數 - 下移宣告（將變數定義移動到第一次使用之前） ---- ### 偏好單次寫入的變數 - 持續改變數值的變數，會讓程式碼更難以理解 - 操作變數的地方愈多，愈難記得變數目前的數值 - Immutablilty - 能 const 就盡量 const --- ## 其他重要觀念 (Ch 10~13) - Ch 10 抽離不相關子問題 - Ch 11 一次一項工作 - Ch 12 將想法轉化為程式碼 - Ch 13 撰寫較少程式碼 --- ### 實戰你有跟面試一樣的感受嗎？ ```cpp= bool ChannelControl::RemoveGlobalHideId(set<ChannelID>& channelIdSet) { bool blRv=false; set<ChannelID> channelIdSetNew; vector<ChannelID> hideChannelId; iflogreturn(!model_.GetGlobalHide(hideChannelId)); if(hideChannelId.size() == 0) { blRv=true; return blRv; } for (auto it=channelIdSet.begin(); it != channelIdSet.end(); it++) { std::vector<ChannelID>::iterator it2; bool blFind=false; for (it2=hideChannelId.begin(); it2 != hideChannelId.end(); it2++) { if(*it == *it2) { blFind=true; break; } } if(blFind == false) { channelIdSetNew.insert(*it); } } channelIdSet.swap(channelIdSetNew); blRv=true; return blRv; } ``` ---- 太長了，節錄一下，來 ~~奇文共賞~~ 檢視一下這段 code 1. 給大家一分鐘看懂這一段的目的是什麼 2. 同時思考這段 code 有什麼問題 ```cpp= // set<ChannelId> channelIdSet, channelIdSetNew; // vector<ChannelId> hideChannelId; for (auto it=channelIdSet.begin(); it != channelIdSet.end(); it++) { std::vector<ChannelID>::iterator it2; bool blFind=false; for (it2=hideChannelId.begin(); it2 != hideChannelId.end(); it2++) { if(*it == *it2) { blFind=true; break; } } if(blFind == false) { channelIdSetNew.insert(*it); } } channelIdSet.swap(channelIdSetNew); ``` ---- 這一段的目的參考解答 - 對於每個在 `channelIdSet` 內的 `ChannelId` $x$，檢查 $x$ 是否在`hideChannelId` 裡面。 - 如果不是的話，把 $x$ 放在 `channelIdSetNew` 裡面。 - 最後 `channelIdSet.swap(channelIdSetNew)` ---- 化約一下 - 對於每個在 `channelIdSet` 內的 `ChannelId` $x$，檢查 $x$ 是否在`hideChannelId` 裡面。 - ~~如果不是的話，把 $x$ 放在 `channelIdSetNew` 裡面。~~ - ~~最後 `channelIdSet.swap(channelIdSetNew)`~~ - 如果是的話，把 $x$ 從 `channelIdSet` 砍掉。 ---- 這段 code 的問題： 1. `blFind`, `channelIdSetNew` 可以直接砍掉（消除中間變數） 2. `blFind` 命名不清楚（要 find 什麼？不是 found 嗎？） 3. `it`, `it2` 完全是懶人命名，且 `it2` scope 過大 4. 我們早就有 C++ 11 了，明明可以用 range-based `for` loop ---- ```cpp= for (auto id : channelIdSet) { for (auto id_to_remove : hideChannelId) { if (id == id_to_remove) { channelIdSet.erase(id); } } } ``` *註：這樣直接 erase 可能會導致非預期的行為，一開始沒注意到，感謝前輩提醒* 別急，然後還有一個大問題 ---- 這段 code 的大問題：作者大概不熟悉 C++ 容器再看一次化約後的邏輯： - 對於每個在 `channelIdSet` 內的 `ChannelId` $x$，檢查 $x$ 是否在`hideChannelId` 裡面。 - 如果是的話，把 $x$ 從 `channelIdSet` 砍掉。 ---- 不就是 `setdiff(channelIdSet, hideChannelId)`？ `std::set` 有 `erase` 可以用 ---- ```cpp= for (auto id_to_remove : hideChannelId) { channelIdSet.erase(id_to_remove); } ``` ---- 我們把 - 7個變數 => 3個變數 - 14行 => 3行 - `for` * 2, `if` * 2 => `for` * 1 - Time complexity $O(NM)$ => $O(M \log N)$ --- 希望大家可以多多愛惜 code base 品質 :) --- ## Q & A ![image](https://hackmd.io/_uploads/Bk5Pc56Lp.png)