# 正則表達式 Regex (Reg[ular] ex[pression]) ###### tags: `AppWorks School 讀書會` ![](https://i.imgur.com/E9DTnrb.png) ## 💯 用 Regex 的好處 1. 一種通用的規範,在幾乎所有語言中都可以使用,如 JavaScript, Java, C#, python,學會後 CP 值高。 2. 語法精簡:處理字串資料時,用簡短的程式碼取代大篇幅 if else 判斷。 3. 速度快:由特別的演算法做的,在做資料處理時,比切字串效率高。 >Before regular expressions can be used, they have to be compiled. This process allows them to perform matches more efficiently. More about the process can be found in dotnet docs. - [MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/GlobalObjects/RegExp) ## 👩🏽‍💻 應用場景 ### VScode 搜尋與替代 ![](https://i.imgur.com/IxfUJon.png) ### 驗證使用者輸入 1. 只能輸入英文及數字: `/[^A-Za-z0-9]/g` 2. 西元生日 YYYY-MM-DD:`/^[1-9]\d{3}-\d{2}-\d{2}$/` 3. Gmail 信箱判斷:`/^\w+@gmail\.com$/` ### 字串比對替代 如:將縮寫的 3 碼色碼轉為 6 碼 ```javascript= function toFullHex(shortHex) { const reg = /^#(\w)?(\w)?(\w)?/g return shortHex.replace(reg, "#$1$1$2$2$3$3") } console.log(toFullHex('#ac2')) // #aacc22 ``` ## 📝 語法 測試網站: https://regex101.com/ https://regexr.com/ 練習網站:https://regexone.com/ ### 建立 Regular Expressions pattern(正則表達式規則) 1. Literal notation script 載入時即編譯,==效能較好==,適用於 pattern ==不會改變==時。 ```javascript= const regex = /ab+c/ const regexA = /ab+c/g ``` 2. Function Constructor 程式執行過程才會被編譯,==效能較差==,適用於 pattern ==動態改變==時。 ```javascript= const regex = new RegExp('ab+c') const regexA = new RegExp('ab+c', 'g') ``` >The literal notation results in compilation of the regular expression when the expression is evaluated. On the other hand, the constructor of the RegExp object, new RegExp('ab+c'), results in runtime compilation of the regular expression. >Use a string as the first argument to the RegExp() constructor when you want to build the regular expression from dynamic input. - [MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp) ### 字元 | 寫法 | 含義 | | -------- | -------- | | /aA/ | 匹配 aA | | /[aA]/ | 匹配 a ==或== A | | /[^aA]/ | 匹配 a ==及== A ==以外== | | /^aA/ | ==開頭==一定要為 aA | | /aA$/ | ==結尾==一定要為 aA | | /and\|android/ | 匹配 and 或 android,注意順序有差 | <details> <summary>只能輸入英文及數字</summary> <pre><code>/[A-Za-z0-9]/ </code></pre> </details> --- ### 字元集合 | 寫法 | 含義 | | -------- | -------- | | . | 匹配所有字元但不包含換行,\[^\n] | | \w | 匹配字母及數字及底線,相當於 \[a-zA-Z0-9_] | | \W (大寫) | 匹配上述以外的字元,相當於 \[^a-zA-Z0-9_] | | \d | 匹配數字,相當於 \[0-9] | | \D (大寫) |匹配數字以外,相當於 \[^0-9] | | \s | 匹配換行及空白,常用於把看不見得字元都去掉,相當於 \[\r\n\t\f\v] | | \S (大寫) | 匹配換行及空白以外,相當於 \[^\r\n\t\f\v] | <details> <summary>Gmail 信箱判斷:字母數字底線及符號 . 加上 @gmail.com</summary> <pre><code>/^[\w\.]+@gmail\.com$/ </code></pre> </details> --- ### 特殊字元 - 要匹配特殊字元時,要加上反斜線,否則視為其他用途 - 常見特殊字元有:`^$\+*?[].|(){}` <details> <summary>匹配 1+2=3</summary> <pre><code>/1\+2=3/ </code></pre> </details> --- ### 匹配次數 | 寫法 | 含義 | | -------- | -------- | | \? | 匹配 0-1 次 | |/aA?/ | a, aA 可匹配 | | 寫法 | 含義 | | -------- | -------- | | \+ | 匹配 1 ~ 無限次 | |/aA+/| aA, aAAAAAAA 可匹配 | | 寫法 | 含義 | | -------- | -------- | | \* | 匹配 0 ~ 無限次 | |/aA*/| a, aA, aAAA, aAAAAAAA 可匹配 | | 寫法 | 含義 | | -------- | -------- | | \{n,m} | n ~ m 次, m 為空白則代表無限次 | |/aA{3,}/| aAAA, aAAAAAAAA 可匹配 | <details> <summary>西元生日 YYYY-MM-DD</summary> <pre><code>/^[1-9]\d{3}-\d{2}-\d{2}$/ </code></pre> </details> #### 貪婪模式 - 匹配次數預設是貪婪模式,會盡可能用較多次的做比對 - 舉例:aacbacbc 以下列分別進行 match ,結果會是? <details> <summary>'aacbacbc'.match(/a.*b/)</summary> <pre><code>aacbacb </code></pre> </details> <!-- ```javascript= const str = 'aacbacbc'; const reg = /a.*b/; const res = str.match(reg); console.log(res[0]) // ``` --> <details> <summary>'aacbacbc'.match(/a.*?b/)</summary> <pre><code>aacb </code></pre> </details> <!-- - `'aacbacbc'.match(/a.*?b/)` ```javascript= const str = 'aacbacbc'; const reg = /a.*?b/; const res = str.match(reg); console.log(res[0]) // aacb ``` --> ![](https://i.imgur.com/CaHwg5v.png) --- ### flag | 寫法 | 含義 | | -------- | -------- | | | 沒有 flag 時,匹配到內容後就停止匹配 | |g | global 匹配多次 | |m | 多行模式,每一行都會進行匹配(以 ^\|$ 作為例子可明顯看出差別) | |i | insesitive 忽略大小寫 | - 舉例:找出 div 包含的內容,須包含 div tag ```javascript= const str = '<div><span>用户:<span/><span>ruei<span/></div><div><span>密码:<span/><span>123456<span/></div>' const reg = /<div>.*?<\/div>/g; const res = str.match(reg); res.forEach((item) => { console.log(item) }) ``` --- ### Capturing group 匹配群組 - 功能:將想取得的特定字串用小括號 group 起來,再以特定語法取出,常用在爬蟲爬取資訊 - 語法: () - group 有幾種方式取得: - 正則內取 group 用 \1, - 前後對稱性比對 ```javascript= const regex = /(abc|def)=\1/gms 'abc=abc, def=def, abc=def, def=abc' ``` - 日期格式: 2022/01/04 or 2022-01-04 ```javascript= const regex = /\d{4}(\/|-)\d{1,2}\1\d{1,2}/gm ``` - 程式碼內取 group 用 $1 ![](https://i.imgur.com/IxfUJon.png) ### Look around Assertions 合樣: 欲匹配的字元前後(左右)是否有指定字元,不會被記進 group - Lookahead `?=` 後面(字串右邊)需要跟著 `?!` 後面(字串右邊)不能跟著 - 右肯定合樣 a(?=b): a 的後面一定要是 b - 姓王的`(\w*) (?=Wang)` - 右否定合樣 a(?!b): a 的後面不能是 b - 不姓王的`(\w*) (?!Wang)` - Lookbehind `?<=` 前面(字串左邊)需要跟著 `?<!` 前面(字串左邊)不能跟著 - 左肯定合樣 (?<=b)a: a 的前面一定是 b - 左否定合樣 (?<!b)a: a 的前面不能是 b #### 合樣舉例 1. 至少有一個數字 2. 至少有一個小寫英文字母 3. 至少有一個大寫英文字母 4. 字串長度在 6 ~ 30 個字母之間 <details> <summary>符合以上需求的正則表達式</summary> <pre><code>const regex = /^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,30}$/ </code></pre> </details> ### 其他常見 <details> <summary>匹配網址:https/http:// </summary> <pre><code>const regex = /https?:\/\/(www\.)?(\w+)(\.\w+)/gm </code></pre> </details> <details> <summary>台灣手機號碼:09 或 +8869 開頭加 8 個數字</summary> <pre><code>const regex = /^(?:0|\+?886)?0\d{8}$/gm </code></pre> </details> ## 💻 Regex in JavaScript #### RegExp Methods: - `RegExp.prototype.test(param: string):boolean` 檢查字串中是否有符合的部分,回傳 true | false ```javascript= const str = "hello world!"; const result = /^hello/.test(str); console.log(result); // true ``` >*[global flags 搭配 RegExp.prototype.test() 遇到的雷](https://uu9924079.medium.com/%E7%82%BA%E4%BB%80%E9%BA%BC-regexp-test-%E8%BC%B8%E5%87%BA%E7%9A%84%E7%B5%90%E6%9E%9C%E6%9C%83%E4%B8%8D%E5%90%8C-%E4%BA%86%E8%A7%A3-js-regex-%E4%B8%AD%E7%9A%84-test-%E6%96%B9%E6%B3%95-5d7ea2d3e261)* - `RegExp.prototype.exec(param: string):string[] | null` 檢查字串中匹配到的部分,回傳陣列 | null ```javascript= const myRe = /ab*/g; const str = "abbcdefabh"; let myArray; while ((myArray = myRe.exec(str)) !== null) { let msg = `Found ${myArray[0]}. `; msg += `Next match starts at ${myRe.lastIndex}`; console.log(msg); } // Found abb. Next match starts at 3 // Found ab. Next match starts at 9 ``` #### String Methods: - `String.prototype.replace(pattern: string | RegExp, replacement: string | function):string` 尋找字串中匹配到的部分並取代之,回傳取代後的新字串 ```javascript= const dates = ["1992-02-01", "2021-04-04", "2033-01-01"]; // 把 - 替換成 / const newDates = dates.map((date) => { // return date.replaceAll('-', '/') return date.replace(/\-/g, '/') }) console.log(newDates); // ["1992/02/01", "2021/04/04", "2033/01/01"] // 把 - 分別替換成 年 月 最後加上日 const newDates = dates.map((date) => { return date.replace( /(\d*)\-(\d*)\-(\d*)/g, function (match, p1, p2, p3) { return p1 + "年" + p2 + "月" + p3 + "日"; } ); }); console.log(newDates); // [ '1992年02月01日', '2021年04月04日', '2033年01月01日' ] ``` ```javascript= // 將 camel case or pascal case 轉乘 hyphen lower function styleHyphenFormat(propertyName) { function toHyphenLower(match, offset, string) { return (offset > 0 ? "-" : "") + match.toLowerCase(); } return propertyName.replace(/[A-Z]/g, toHyphenLower); } console.log(styleHyphenFormat('borderTop')); // border-top console.log(styleHyphenFormat('BackgroundImage')); // background-image console.log(styleHyphenFormat('paddingBottom')); // padding-bottom ``` - `String.prototype.search(param: string | RegExp):number` 尋找字串中第一個符合的部分,有的話回傳 index,否則回傳 -1。 ```javascript= const str = "hey JudE"; const re = /[A-Z]/g; const reDot = /[.]/; console.log(str.search(re)); // returns 4, which is the index of the first capital letter "J" console.log(str.search(reDot)); // returns -1 cannot find '.' dot punctuation ``` - `String.prototype.match(param: string | RegExp):string[]` 將字串中匹配到的項目以陣列回傳 ```javascript= const str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; const regexp = /[A-E]/gi; const matches = str.match(regexp); console.log(matches); // ['A', 'B', 'C', 'D', 'E', 'a', 'b', 'c', 'd', 'e'] ``` - `String.prototype.split(param: string | RegExp):string[]` 根據字串中匹配到的項目拆成陣列 ```javascript= const names = "Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand "; console.log(names); const re = /\s*(?:;|$)\s*/; const nameList = names.split(re); console.log(nameList); ``` ## 🤯 還是學不會? ![](https://i.imgur.com/OvMJkiV.png) <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wow, you can use <a href="https://twitter.com/hashtag/ChatPGT?src=hash&amp;ref_src=twsrc%5Etfw">#ChatPGT</a> to write Regex for you 🤯 <a href="https://t.co/5rKOFWc16i">pic.twitter.com/5rKOFWc16i</a></p>&mdash; Pierre Romera (@pirhoo) <a href="https://twitter.com/pirhoo/status/1600084694451757056?ref_src=twsrc%5Etfw">December 6, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- #### 參考 [保哥深入淺出正則表達式](https://www.youtube.com/watch?v=Ex6cCWDwNJU) [[JS] 正則表達式(Regular Expression, regex](https://pjchender.dev/javascript/js-regex/) [貪婪模式](https://dailc.github.io/2017/07/06/regularExpressionGreedyAndLazy.html) [使用 Regular Expression 驗證密碼複雜度](https://blog.miniasp.com/post/2008/05/09/Using-Regular-Expression-to-validate-password) [RegExp.prototype.test()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test) [RegExp.prototype.exec()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec) [String.prototype.replace()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) [String.prototype.search()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/search) [String.prototype.match()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match) [String.prototype.split()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split)