Regular Expressions

--- title: Regular Expressions tags: sirla, class, JavaScript description: 1. title 請改為 [授課名稱]課程 2. tag 請刪去template，加上活動內容類型或名稱 3. 下方會議記錄請使用會議記錄範本 4. 加上"{%hackmd BkVfcTxlQ %}"意為套用黑色模板 --- {%hackmd BkVfcTxlQ %} # **_Regular Expressions_** > 負責人：黃丰嘉 > 授課時間：2019-11-17 (日) > --- # **參考資源** > 1. [(MDN) 正規表達式](https://developer.mozilla.org/zh-TW/docs/Web/JavaScript/Guide/Regular_Expressions) > 2. [(JS) 正則表達式(Regular Expression, regex)](https://pjchender.github.io/2017/09/26/js-%E6%AD%A3%E5%89%87%E8%A1%A8%E9%81%94%E5%BC%8F-regular-expression-regex/) > 3. [(w3schools) JavaScript RegExp Reference](https://www.w3schools.com/jsref/jsref_obj_regexp.asp) > 4. [JavaScript (7) – 字串處理與正規表達式 (作者：陳鍾誠)](http://programmermagazine.github.io/201307/htm/article2.html) > 5. [greedy & nongreedy](https://amobiz.github.io/2014/08/03/regular-expression-javascript-study-notes-1-theory-1/) > 6. [Javascript Regular Expressions , 表示法](https://www.puritys.me/docs-blog/article-30-Javascript-Regular-Expressions-,-%E8%A1%A8%E7%A4%BA%E6%B3%95.html) > 7. [量符(Quantifier) greedy vs. nongreedy](http://notepad.yehyeh.net/Content/Program/RegularExpression/8.php) --- # **課程大綱** [TOC] --- ## (P) **什麼是 Regular Expressions?** * Regular Expressions (正規表達式) * 一個描述字串資料(string data)之模式(patterns)的方式。 ## (P) **創建一個正規表達式 (Creating a regular expression)** * 正規表達式的型別(type)為`物件(object)`。 * 創建方式 1. 使用 `RegExp()` ```javascript= let re1 = new RegExp("abc"); ``` 2. 使用 `正斜杠 /` 包覆所描述的模式 ```javascript= let re2 = /abc/; ``` * Both of those regular expression objects represent the same pattern: an `a` character followed by a `b` followed by a `c`. * 若`?`和`+`的前面添加反斜線`\`，代表將之當成一般字元使用，意即表示字元本身(沒有特殊用途)。 ```javascript= let eighteenPlus = /eighteen\+/; // + 原本是特殊字元，但這裡要當成非特殊字元 ``` ## (P) **比對測試 (Testing for matches)** * `test()`方法 * 給定一個字串，以及欲比對的模式。它將回傳布林值，並說明該字串是否符合欲比對的模式。 ```javascript= console.log(/abc/.test("abcde")); // → true console.log(/abc/.test("abxde")); // → false ``` > 若正規表達式中只包含`非特殊字元`，表示該模式為比對字元本身。 ## (P) **字元的集合 (Sets of characters)** * `indexOf`方法 * 回傳給定元素於陣列中第一個被找到之索引，若不存在於陣列中則回傳`-1`。 ```javascript= console.log(/[0123456789]/.test("in 1992")); // → true console.log(/[0-9]/.test("in 1992")); // → true >>> Both match all strings that contain a digit. ``` > 若將一組字符放在`[]`內，表示匹配`[]`內的所有字符。 > 意即只要符合`[]`內所列出的字符，就算匹配成功。 > `[]`內所使用的`hyphen (-)`，表示字符範圍，其順序由Unicode決定。 > Ex：字符`0`到`9`，以Unicode的順序彼此相鄰（代號48到57），`[0-9]`代表匹配所有的數字字符。 * 許多常見的字符組都有自己的內建快捷表達方式(built-in shortcuts)。如：`\d` 與 `[0-9]` 含義相同(匹配所有數字)。 |特殊字元|解說| |---|---| |\d|任何數字| |\w|任何字母和數字（"文字字符"）| |\s|任何空白字符（space, tab, newline...）| |\D|非數字的字符| |\W|非字母和數字字符| |\S|非空白字符| |.|除換行字符以外的任何字符| ```javascript= let dateTime = /\d\d-\d\d-\d\d\d\d \d\d:\d\d/; console.log(dateTime.test("01-30-2003 15:20")); // → true console.log(dateTime.test("30-jan-2003 15:20")); // → false * 稍後，我們會對此`匹配日期和時間的表達式`進行一些改進！ ``` * `[\d.]` 表示匹配任何數字或句點字符。若`.`或`+`存在於`方括號 [ ]`內，則會失去其特殊的意義。 * `[^aA]` 表示匹配所有不是 a 或 A 的字。若`^`存在於`方括號 [ ]`內，則表示`匹配非方括號內的字符`。 ```javascript= let notBinary = /[^01]/; console.log(notBinary.test("1100100010100110")); // → false console.log(notBinary.test("1100100010200110")); // → true 1100100010`2`00110 ``` ## (P) **重複部分的模式 (Repeating parts of a pattern)** > 如何匹配一個或多個數字的序列？ * 當把`+`放在某個字符後面，代表該字符重複出現`至少1次`。 * `/\d+/` 匹配任何數字，且其至少出現1次。 * 當把`*`放在某個字符後面，代表該字符重複出現`0次以上`。 * 當把`?`放在某個字符後面，代表該字符可能`出現0次或1次`。 ```javascript= console.log(/'\d+'/.test("'123'")); // 任何數字至少出現1次，即符合 // → true console.log(/'\d+'/.test("''")); // → false console.log(/'\d*'/.test("'123'")); // 任何數字出現0次以上，即符合 // → true console.log(/'\d*'/.test("''")); // → true let neighbor = /neighbou?r/; // u 出現0次或1次，即符合 console.log(neighbor.test("neighbour")); // → true console.log(neighbor.test("neighbor")); // → true ``` * 為了精確匹配該字符的出現次數，使用`{ }`放在某個字符後面。 * `{4}` 代表該字符需出現4次，即符合。 * `{2,4}` 代表該字符出現至少2次、最多4次，即符合。 * `{5,}` 代表該字符出現至少5次，即符合。(開放式範圍) > 接下來，改進`匹配日期和時間的表達式`！ > 允許`日`、`月`、`小時`為一位數或兩位數。 ```javascript= let dateTime = /\d{1,2}-\d{1,2}-\d{4} \d{1,2}:\d{2}/; console.log(dateTime.test("1-30-2003 8:45")); // → true ``` ## (P) **群組子表達式 (Grouping subexpressions)** * ==`括號 ()` 用於驗證字符串== * 若`一次要對多個字符`使用`+`或`*`之類的特殊符號，則必須使用`括號 ()`。 * 若`沒有使用括號 ()`包起來，則代表其是`針對前面的一個字符`。 ```javascript= let cartoonCrying = /boo+(hok+)+/i; console.log(cartoonCrying.test("BoohoooohooHooo")); // → false console.log(cartoonCrying.test("Boohokoohokhook")); // → true >>> 第一個和第二個`+`分別針對`boo`中的第二個`o`，以及`hok`中的`k`。 >>> 第三個`+`是針對整個群組`(hok+)`出現至少一次。 >>> 表達式末端的`i`是指`不區分大小寫`。因此，允許與輸入字串的`B`匹配。 ``` ## (P) **匹配與群組(Matches and groups)** * `test()`方法：最簡單的匹配方式，只會回傳True或False。 * `exec()`(execute)方法：當沒有匹配時，回傳null；否則傳回object匹配資訊。 * `index`屬性：回傳成功匹配的字符從何處開始。 * `match()`方法：處理字串的方法，與`exec()`方法類似。 ```javascript= let match = /\d+/.exec("one two 100"); console.log(match); // → ["100"] console.log(match.index); // → 8 console.log("one two 100".match(/\d+/)); // → ["100"] ``` * 當正規表達式內使用`( )`的群組子表達式時，所匹配的字串將會以`陣列`呈現。`( )`代表要將匹配的內容，另外捕捉、儲存下來。 ```javascript= let quotedText = /'([^']*)'/; // '([^']*)' -> 字串外面使用''包裹。群組裡面沒有`'`出現0次以上。 // 'hello' or hello console.log(quotedText.exec("she said 'hello'")); // → ["'hello'", "hello"] const regexp = /(\w+)\.jpg/; console.log(regexp.exec('File name: cat.jpg')); // ["cat.jpg", "cat", // index: 11, // input: "File name: cat.jpg", groups: undefined] ``` * `?`代表前個字符出現0次或1次。 * `括號 ()`除了驗證字符串，還可用於==提取字串的一部分==。 ```javascript= console.log(/bad(ly)?/.exec("bad")); // (ly) 0次 , 1次 // → ["bad", undefined] /* 先比對 bad(ly)? * 再比對 (ly)? */ console.log(/bad(ly)?/.exec("badl")); // ["bad", undefined, index: 0, input: "badl", groups: undefined] console.log(/bad(ly)?/.exec("badlyr")); // ["badly", "ly", index: 0, input: "badlyr", groups: undefined] ``` * 當一個群組`( )`被成功匹配多次時，僅回傳最後一個匹配的項目，即`"3"`。 ```javascript= console.log(/(\d)+/.exec("123")); // (\d)+ // → ["123", "3"] ``` ## (P) **日期類 (The Date class)** * JavaScript擁有專用於表示日期（或時間點）的class。稱為`Date`。 * `Date`物件提供`getFullYear()`, `getMonth()`, `getDate()`, `getHours()`, `getMinutes()`, and `getSeconds()`方法。 * 取得現在的日期和時間 ```javascript= console.log(new Date()); // → Tue Nov 12 2019 10:28:37 GMT+0800 (台北標準時間) ``` * 創建特定時間的物件(object) ```javascript= console.log(new Date(2019, 11, 12)); // → Thu Dec 12 2019 00:00:00 GMT+0800 (台北標準時間) console.log(new Date(2009, 11, 12, 12, 59, 59, 999)); // → Sat Dec 12 2009 12:59:59 GMT+0800 (台北標準時間) ``` > 注意！ > JavaScript的月份從0開始（所以12月是11）；日期從1開始。 > 最後的四個參數為（小時，分鐘，秒和毫秒），預設皆為0。 * JavaScript的時間戳記(Timestamps)遵循Unix time，紀錄1970年以來的毫秒數並儲存在UTC時區中。 * 使用`getTime()`方法，可回傳從1970年至某個時間點的毫秒數。 * 若給予`Date()`方法一個參數，則JavaScript將該參數視為毫秒數。 ```javascript= console.log(new Date(2019, 10, 12).getTime()); // → 1573488000000 console.log(Date.now()); // → 1573527038896 console.log(new Date(1573527038896)); // → Tue Nov 12 2019 10:50:38 GMT+0800 (台北標準時間) ``` * 進行日期的處理 ```javascript= function processDate(string) { let [_, month, day, year] = /(\d{1,2})-(\d{1,2})-(\d{4})/.exec(string); return new Date(year, month - 1, day); } console.log(processDate("testing: 1-30-2003 in javascript.")); // → Thu Jan 30 2003 00:00:00 GMT+0800 (台北標準時間) ``` > `底線 _ (underscore)` 忽略，用於跳過由exec返回的陣列中匹配的物件。 > > 備註： > (匹配模式不夠嚴謹，於下一節 Word and string boundaries 說明) > ``` > console.log(getDate("100-1-30000")); // 匹配 "00-1-3000" > >>> Sun Dec 01 2999 00:00:00 GMT+0800 (台北標準時間) > ``` ## (P) **單詞和字串的邊界 (Word and string boundaries)** * 使用`^`匹配字串的開頭；使用`$`匹配字串的結尾。 * `/^\d+$/` 匹配開頭包含至少一個數字的字串、結尾包含至少一個數字的字串。`/^!/` 匹配開頭為`!`的字串。 * 錯誤寫法：`/x^/` * 使用`\b`配對 word boundary，word boundary 是指一個字元的前後沒有其他任何字元。 ```javascript= let matchedResult = 'This is an apple.'.match(/\bis\b/); // is 這個單字才會被選到 // Th`is` 的 is 不會被選到，因為前有其他字元。 // [ 'is', index: 5, input: 'This is an apple.' ] ``` ```javascript= console.log(/cat/.test("concatenate")); // → true console.log(/\bcat\b/.test("concatenate")); // → false console.log(/\bcat/.test("catenate")); // → true ``` > 注意： > 邊界標記與實際字符不匹配。 > 它只是強制正則表達式僅在`條件出現在模式中的位置時`才匹配。 > ## (P) **選擇模式 (Choice patterns)** * 使用`|` 表示在左側模式和右側模式之間進行選擇。(一次匹配多種可選擇的類型) ```javascript= let animalCount = /\b\d+ (pig|cow|chicken)s?\b/; console.log(animalCount.test("15 pigs")); // → true console.log(animalCount.test("15 pigchickens")); // → false ``` ## (P) **匹配機制 (The mechanics of matching)** ![](https://i.imgur.com/sLFsC8j.png) ```javascript= let animalCount = /\b\d+ (pig|cow|chicken)s?\b/; console.log(animalCount.test("the 3 pigs")); // → true ``` * 概念上，當使用`exec()`或`test()`時，正規表達式引擎會嘗試在字串中查找匹配項： * 首先，從字串開頭匹配，然後從第二個字符匹配，依此類推，直到找到匹配項或到達字符串的末尾。 * 如果可以找到從圖的左側到右側的路徑，則表達式匹配。 ## (P) **回溯 (Backtracking)** ![](https://i.imgur.com/RCFq4Xa.png) ```javascript= let reg = /\b([01]+b|[\da-f]+h|\d+)\b/; console.log(reg.test("0 856 fr2 eah 1b")); // → true console.log(reg.exec("0m 856 fr2 eah 1b")); // 856 eah 1b 皆符合 // → ["856", "856", index: 3, input: "0m 856 fr2 eah 1b", groups: undefined] ``` > 若有多個匹配，則只回傳第一個成功匹配的。 > ```javascript= let reg = /^.*x/; console.log(reg.test(".")); // → false console.log(reg.test("")); // → false console.log(reg.test("x")); // → true console.log(reg.test(".x")); // → true console.log(reg.test(".xa")); // → true console.log(reg.test("abcxe")); // → true ``` ```javascript= let reg = /([01]+)+b/; console.log(reg.test("01 123")); // → false console.log(reg.test("0 reg")); // → false console.log(reg.test("01b 123")); // → true console.log(reg.test("0b reg")); // → true ``` ## (P) **取代方法 (The replace method)** * `replace()`方法：用於將一個字串替換為另一個字串。 * (replace()的第一個參數)使用`g`於正規表達式中，可取代`所有`匹配項。 ```javascript= console.log("papa".replace("p", "m")); // → mapa console.log("Borobudur".replace(/[ou]/, "j")); // → Bjrobudur console.log("Borobudur".replace(/[ou]/g, "j")); // → Bjrjbjdjr ``` * (replace()的第二個參數)使用`$`，可針對`括號( )`內的內容。 ```javascript= console.log( "Liskov, Barbara\nMcCarthy, John\nWadler, Philip" .replace(/(\w+), (\w+)/g, "$2 $1")); // → Barbara Liskov // John McCarthy // Philip Wadler console.log( "Liskov, Barbara\nMcCarthy, John\nWadler, Philip" .replace(/(\w+), (\w+)/g, "$&")); // → Liskov, Barbara // McCarthy, John // Wadler, Philip let s = "the cia and fbi"; console.log(s.replace(/\b(fbi|cia)\b/g, str => str.toUpperCase())); // → the CIA and FBI ``` > `$2`指第二個括號`(\w+)`的內容；`$1`指第一個括號`(\w+)`的內容。 > `$&`指的是所有括號`( )`的內容。 * [String.prototype.replace()](https://developer.mozilla.org/zh-TW/docs/Web/JavaScript/Reference/Global_Objects/String/replace) ```javascript= let stock = "1 lemon, 2 cabbages, and 101 eggs"; console.log(stock.replace(/(\d+) (\w+)/g,"$&")); //(amount) (unit) // → 1 lemon, 2 cabbages, and 101 eggs function minusOne(match, amount, unit) { // $&, amount=1, unit=lemon amount = Number(amount) - 1; if (amount == 1) { // only one left, remove the 's' unit = unit.slice(0, unit.length - 1); } else if (amount == 0) { amount = "no"; } return amount + " " + unit; } console.log(stock.replace(/(\d+) (\w+)/g, minusOne)); // → no lemon, 1 cabbage, and 100 eggs ``` ## (P) **貪婪式比對 (Greed)** 1. Greedy * 使用`*`、`+`、`?`或`{}`等重複運算符，將會使這些greedy（儘可能匹配最多的字元）。 ```javascript= var html = ` <table> <td>aaa</td> </table> <table> <td>bbb</td> </table> `; var reg = /<table[.\n\s\S]*<\/table>/g; var r = html.match(reg); console.log(r); // → [ '<table>\n <td>aaa</td>\n</table>\n<table>\n <td>bbb</td>\n</table>' ] ``` ```javascript= function stripComments(code) { return code.replace(/\/\/.*|\/\*[^]*\*\//g, ""); } // \/\/.* 匹配單行註解 // \/\*[^]*\*\/ 匹配多行註解 console.log(stripComments("1 + /* 2 */3")); // → 1 + 3 console.log(stripComments("x = 10;// ten!")); // → x = 10; console.log(stripComments("1 /* a */+/* b */ 1")); // → 1 1 ``` >`[^]*` 會匹配儘可能多的字元。 > `[^]` 匹配只要是非空集合的任何字元都符合(any character that is not in the empty set of characters)。 > `.` 匹配任意單一字元。 2. Non-Greedy * 使用`*?`、`+?`、`??`或`{}?`，將會使nongreedy（儘可能匹配最少的字元）。 * 例如：在`123abc`中應用 `/\d+/` 可匹配「123」，但使用 `/\d+?/` 在相同字串上只能匹配「1」。 ```javascript= var html = ` <table> <td>aaa</td> </table> <table> <td>bbb</td> </table> `; var reg = /<table[.\n\s\S]*?<\/table>/g; var r = html.match(reg); console.log(r); // → [ '<table>\n <td>aaa</td>\n</table>', // '<table>\n <td>bbb</td>\n</table>' ] ``` ```javascript= function stripComments(code) { return code.replace(/\/\*[^]*?\*\//g, ""); } console.log(stripComments("1 /* a */+/* b */ 1")); // → 1 + 1 function stripComments(code) { return code.replace(/\/\*[^]*\*\//g, ""); } console.log(stripComments("1 /* a */+/* b */ 1")); // → 1 1 ``` ## (P) **動態創建RegExp物件 (Dynamically creating RegExp objects)** * 建構一個字串，並在其上使用`RegExp()`構造函數 > [JavaScript: 如何轉換 Regular Expression 變成 RegExp() 的參數](http://magicjackting.pixnet.net/blog/post/208907845) ```javascript= let name = "harry"; // 把 harry 當成模式 let text = "Harry is a suspicious character."; let regexp = new RegExp("\\b(" + name + ")\\b", "gi"); // 符合模式 console.log(text.replace(regexp, "_$1_")); // → _Harry_ is a suspicious character. name1 = "harary"; // 把 harary 當成模式 let regexp1 = new RegExp("\\b(" + name1 + ")\\b", "gi"); // 不符合模式 console.log(text.replace(regexp1, "_$1_")); // → Harry is a suspicious character. ``` ```javascript= let name = "harry"; // 把 harry 當成模式 let text1 = "Harry is a suspicious character."; let regexp = new RegExp(`\\b(${name})\\b`, "gi"); // 符合模式 text1.replace(regexp, "_$1_"); // → _Harry_ is a suspicious character. ``` > `\\b` 使用2個反斜線，因為是用一般字串來表示，而非使用正規表達式來表示。 > `g` 針對`所有`匹配項。`i` 不區分大小寫。 * 假設有人很中二的寫他的名字`dea+hl[]rd` * `/[`：使用`/`在特殊符號(如：`[`)之前，會將之當成`非特殊字元`處理。 ```javascript= let name = "dea+hl[]rd"; let text = "This dea+hl[]rd guy is super annoying."; let escaped = name.replace(/[\\[.+*?(){|^$]/g, "\\$&"); console.log(escaped); // [\\[.+*?(){ // ^$] // → dea\+hl\[]rd let regexp = new RegExp("\\b" + escaped + "\\b", "gi"); console.log(text.replace(regexp, "_$&_")); // → This _dea+hl[]rd_ guy is super annoying. ``` ## (P) **搜尋方法 (The search method)** * [`indexOf()`方法](https://developer.mozilla.org/zh-TW/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf) * 不能用於正規表達式。~~`str.indexOf(/[abc]/ , i);`~~ * 優點：能從特定位置開始搜尋。 * [`search()`方法](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/search) * 可用於正規表達式。 * 回傳給定元素於陣列中`第一個`被找到之索引，若不存在於陣列中則回傳`-1`。 * 缺點：不能從特定位置才開始搜尋。 ```javascript= console.log(" word".search(/\S/)); //012 // → 2 console.log(" ".search(/\S/)); // → -1 ``` ## (P) **lastIndex屬性 (The lastIndex property)** * [`exec()`方法](https://developer.mozilla.org/zh-TW/docs/Web/JavaScript/Reference/Global_Objects/String/lastIndexOf) * 缺點：不能從特定位置才開始搜尋。 * 正規表達式的物件，擁有以下屬性： * source：創建表達式的字串。 * lastIndex：用於控制，該從何處開始匹配。 ```javascript= let str = "a0bc1"; // Indexes: 01234 let rexWithout = /\d/; let rexWithout_g = /\d/g; let rexWith_y = /\d/y; // Without: rexWithout.lastIndex = 2; console.log(rexWithout.exec(str)); // ["0"], found at index 1, because without the g or y flag, // the search is always from index 0 // → [ '0', index: 1, input: 'a0bc1', groups: undefined ] // Without: rexWithout_g.lastIndex = 2; console.log(rexWithout_g.exec(str)); // → [ '1', index: 4, input: 'a0bc1', groups: undefined ] // With, unsuccessful: rexWith_y.lastIndex = 2; // Says to *only* match at index 2. console.log(rexWith_y.exec(str)); // => null, there's no match at index 2 // → null // With, successful: rexWith_y.lastIndex = 1; // Says to *only* match at index 1. console.log(rexWith_y.exec(str)); // => ["0"], there was a match at index 1. // → [ '0', index: 1, input: 'a0bc1', groups: undefined ] // With, successful again: rexWith_y.lastIndex = 4; // Says to *only* match at index 4. console.log(rexWith_y.exec(str)); // => ["1"], there was a match at index 4. // → [ '1', index: 4, input: 'a0bc1', groups: undefined ] ``` > 當釘住`y`啟用，代表只會匹配給定的索引(`rexWith.lastIndex = 2;`)，不再繼續匹配後面的索引。 > > `g`則會先全部搜尋一次，再從符合匹配的地方開始。 ```javascript= let pattern = /y/g; //比對全部的y pattern.lastIndex = 3; //從index=3，開始比對 let match = pattern.exec("xyzzy"); console.log(match.index); // → 4 console.log(pattern.lastIndex); //在第8個位置 // → 5 let global = /abc/g; // `g` global console.log(global.exec("xyz abc")); // → [ 'abc', index: 4, input: 'xyz abc', groups: undefined ] let sticky = /abc/y; // `y` sticky options console.log(sticky.exec("xyz abc")); // → null ``` * 當對多個exec()調用共享的正規表達式之值時，對lastIndex屬性的自動更新可能會引起問題。(可能會從上次調用留下的索引處開始。) ```javascript= let digit = /\d/g; console.log(digit.exec("here it is: 1")); // → ["1"] console.log(digit.exec("and now: 1")); // → null ``` * `match()`方法 + `g` global option * `match()`方法會在字串中找到該模式的`所有匹配項`，並回傳包含匹配字串的陣列。 ```javascript= console.log("Banana".match(/an/g)); // → ["an", "an"] ``` ### 循環匹配 (Looping over matches) ```javascript= let input = "A string with 3 numbers in it... 42 and 88."; let number = /\b\d+\b/g; let match; while (match = number.exec(input)) { console.log("Found", match[0], "at", match.index); } // → Found 3 at 14 // Found 42 at 33 // Found 88 at 40 while (match = number.exec(input)) { console.log("Found", match, "at", match.index); } // → Found ["3", index: 14, input: "A string with 3 numbers in it... 42 and 88.", groups: undefined] at 14 // → Found ["42", index: 33, input: "A string with 3 numbers in it... 42 and 88.", groups: undefined] at 33 // → Found ["88", index: 40, input: "A string with 3 numbers in it... 42 and 88.", groups: undefined] at 40 ``` ## (P) **解析一個INI文件 (Parsing an INI file)** * `.ini` 文件規則如下： 1. 空行、以`;`開頭的行，將被忽略。 2. 使用`[`和`]`的行，代表開始新的部分。 3. 包含字母數字標識符，且後面跟著`=`的行，將添加到當前的部分。 4. 其他都無效。 ``` searchengine=https://duckduckgo.com/?q=$1 spitefulness=9.7 ; comments are preceded by a semicolon... ; each section concerns an individual enemy [larry] fullname=Larry Doe type=kindergarten bully website=http://www.geocities.com/CapeCanaveral/11451 [davaeorn] fullname=Davaeorn type=evil wizard outputdir=/home/marijn/enemies/davaeorn ``` * `\r`：返回符號；`\n`：換行符號。 * `/\r?\n/` 允許行與行之間為`\n`或`\r\n`的拆分方式。 * `^`：匹配開頭；`$`：匹配結尾。確保表達式與整行匹配。 ```javascript= function parseINI(string) { // Start with an object to hold the top-level fields let result = {}; let section = result; string.split(/\r?\n/).forEach(line => { let match; if (match = line.match(/^(\w+)=(.*)$/)) { //是屬性 section[match[1]] = match[2]; } else if (match = line.match(/^\[(.*)\]$/)) { //是節標題 section = result[match[1]] = {}; } else if (!/^\s*(;.*)?$/.test(line)) { //不是節標題或屬性 // 檢查它是註釋還是空行 // (;.*) 匹配註釋？匹配空格 throw new Error("Line '" + line + "' is not valid."); // 與任何形式都不匹配時，引發異常。 } }); return result; } console.log(parseINI(` name=Vasilis [address] city=Tessaloniki`)); // → {name: "Vasilis", address: {city: "Tessaloniki"}} ``` ## (P) **國際字符 (International characters)** * JavaScript對於非英文字母的處理，顯得愚蠢。 * JavaScript的`word character`只包含26個大小寫的英文字母、十進位數字、底線。 * 而像是`é`、`β`等字符，將不被匹配`\w`(文字字符)，但可匹配`\W`(非文字字符)。 * `\s`(空白)，可匹配Unicode標準認為的所有字符，包括`不間斷空格`和`蒙古元音分隔符`之類的東西。 * JavaScript預設：處理正規表達式的單個程式碼字元，而不是處理實際的單個字符。 ```javascript= console.log(/🍎{3}/.test("🍎🍎🍎")); // 因為🍎被視為2個程式碼字元所組成 // → false console.log(/<.>/.test("<🌹>")); // → false console.log(/<.>/u.test("<🌹>")); // 必須在正則表達式中添加`u`，以Unicode使其正確處理此類字符。 // → true ``` * `\u` 意味以Unicode處理此類字符。`\p`是 Unicode 屬性轉義，它賦予了我們`根據 Unicode 字符的屬性數據構造表達式`的能力。 * 使用`\p{Property=Value}` 匹配具有該屬性給定值的任何字符。 * 不使用`\p{Property=Value}`方式。若使用`\p{Name}`，`name`將被假定為`Alphabetic`或`Number`的二進為屬性。 ```javascript= console.log(/\p{Script=Greek}/u.test("α")); // → true console.log(/\p{Script=Arabic}/u.test("α")); // → false console.log(/\p{Alphabetic}/u.test("α")); // → true console.log(/\p{Alphabetic}/u.test("!")); // → false ```