JerryWang168
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 【韩顺平讲Java】Chapter27 - 正则表达式 *** > **即使再小的帆也能遠航** > 參考自[【零基础 快速学Java】韩顺平 零基础30天学会Java](https://www.bilibili.com/video/BV1fh411y7R8/?vd_source=c5074574112ef27dae243d70aa2175b8) >###### tags: `韓順平講Java` *** # 快速入門 ![](https://i.imgur.com/UJAKGwY.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 體驗正則表達式的威力,給我們文本處理帶來哪些便利 */ public class Regexp_ { public static void main(String[] args) { //假定,編寫了爬蟲,從百度頁面得到如下文本 String content = "1995年,互聯網的蓬勃發展給了Oak機會。"+ "業界為了使死板、單調的靜態網頁能夠“靈活”起來,"+ "急需一種軟體技術來開發一種程式,這種程式可以通過網路"+ "傳播並且能夠跨平臺運行。於是,世界各大IT企業為此紛紛投"+ "入了大量的人力、物力和財力。這個時候,Sun公司想起了那個被"+ "擱置起來很久的Oak,並且重新審視了那個用軟體編寫的試驗平臺,"+ "由於它是按照按照嵌入式系統硬體平臺體繫結構進行編寫的,所以非常小"+ ",特色適用於網路上的傳輸系統,而Oak也是一種精簡的語言,程式非常小"+ ",適合在網路上傳輸。Sun公司首先推出了可以嵌入網頁並且可以隨同網頁"+ "在網路上傳輸的Applet(Applet是一種將小程式嵌入到網頁中進行執行的"+ "技術),並將Oak更名為Java(在申請註冊商標時,發現Oak已經被人使用了"+ ",再想了一系列名字之後,最終,使用了提議者在喝一杯Java咖啡時無意提到的"+ "Java詞語)。5月23日,Sun公司在Sun world會議上正式發佈Java和HotJava"+ "瀏覽器。IBM、Apple、DEC、Adobe、HP、Oracle、Netscape和微軟等各大公司"+ "都紛紛停止了自己的相關開發項目,競相購買了Java使用許可證,併為自己的產品開發"+ "了相應的Java平臺。"; // 提取文章中所有的英文單詞 // 提取文章中所有的數字 // 提取文章中所有的英文單詞 + 數字 // 提取百度熱搜標題 // (1) 傳統方法: 使用遍歷方式,代碼量大,效率不高 // (2) 正則表達式技術 // 1. 先創建一個Pattern 對象,模式對象,可以理解成就是一個正則表達式對象 // Pattern pattern = Pattern.compile("[a-zA-Z]+"); // Pattern pattern = Pattern.compile("[0-9]+"); // Pattern pattern = Pattern.compile("([0-9]+)|([a-zA-Z]+)"); // Pattern pattern = Pattern.compile("<a target=\"_blank\"(\\S*)\""); Pattern pattern = Pattern.compile("\\d+\\.\\d+\\.\\d+\\.\\d+\\."); // 2. 創建一個匹配器對象 // 理解: 就是matcher匹配器按照 pattern (模式/樣式),到content文本中去匹配 Matcher matcher = pattern.matcher(content); // 3. 可以開始循環匹配,找到返回true,否則false while (matcher.find()){ // 匹配內容、文本,放到m.group(0) System.out.println("找到: "+ matcher.group(0)); } } } ``` ![](https://i.imgur.com/g1DQtKn.jpg) ![](https://i.imgur.com/1sXp7mh.png) ![](https://i.imgur.com/HWNOFjy.png) ## 底層實現 ![](https://i.imgur.com/WJKJKhp.jpg) ![](https://i.imgur.com/HR7pVjq.png) ![](https://i.imgur.com/JRgKdd6.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 分析java的正則表達式的底層實現(重要) */ public class RegTheory { public static void main(String[] args) { String content = "1998年12月8日第二代Java平台的企业版J2EE发布。1999年6月" + ",Sun公司发布了第二代Java平台(简称为Java2)的3个版本:J2ME,应用于移动" + "、无线及有限资源的环境;J2SE(Java 2 Standard Edition,Java 2平台的标准版)" + ",应用于桌面环境;J2EE,应用3443于基于Java的应用服务器。Java 2平台的发布,是Java发展" + "过程中最重要的一个里程碑,标志着Java的应用开始普及9889 "; // 目標: 匹配所有4個數字 // 說明 : // 1. \\d 表示一個任意數字 String regStr = "(\\d\\d)(\\d\\d)"; // 2. 創建模式對象,即正則表達式對象 Pattern pattern = Pattern.compile(regStr); // 3. 創建匹配器 // 說明 : 創建匹配器matcher,按照正則表達式規則去匹配 content字符串 Matcher matcher = pattern.matcher(content); // 4. 開始匹配 /** matcher.find() 完成的任務 1. 根據指定的規則,定位滿足規則的子字符串(比如1998) 2. 找到後,將子字符串的開始索引記錄到matcher對象的屬性 int[] groups groups[0] = 0,把該子字符串的結束索引 +1 的值記錄到 groups[1] = 4 3. 同時記錄oldLast 的值為 子字符串的結束索引 +1 的值,即 4,會在下次執行find時,從4開始匹配 matcher.group(0)分析 源碼: public String group(int group) { if (first < 0) throw new IllegalStateException("No match found"); if (group < 0 || group > groupCount()) throw new IndexOutOfBoundsException("No group " + group); if ((groups[group*2] == -1) || (groups[group*2+1] == -1)) return null; return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString(); } 1. 根據 groups[0] = 0 和 groups[1] = 4 的紀錄位置,從content開始擷取子字符串返回 就是 [0,4) 包含0但不包含索引為4的位置 如果再次執行 find(),仍然按上面分析來執行 */ /** * matcher.find() 完成的任務 ,考慮分組 : "(\\d\\d)(\\d\\d)" * 什麼是分組? 正則表達式中有()表示分組,第一個() 表示第一組,第二個() 表示第二組... 1. 根據指定的規則,定位滿足規則的子字符串(比如 (19)(98)) 2. 找到後,將子字符串的開始索引記錄到matcher對象的屬性 int[] groups 2.1 groups[0] = 0,把該子字符串的結束索引 +1 的值記錄到 groups[1] = 4 2.2 紀錄 1 組 () 匹配到的字符串,groups[2] = 0, groups[3] = 2 2.3 紀錄 2 組 () 匹配到的字符串,groups[4] = 2, groups[5] = 4 2.4 如果有更多的分組... 3. 同時記錄oldLast 的值為 子字符串的結束索引 +1 的值,即 4,會在下次執行find時,從4開始匹配 matcher.group(0)分析 源碼: public String group(int group) { if (first < 0) throw new IllegalStateException("No match found"); if (group < 0 || group > groupCount()) throw new IndexOutOfBoundsException("No group " + group); if ((groups[group*2] == -1) || (groups[group*2+1] == -1)) return null; return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString(); } 1. 根據 groups[0] = 0 和 groups[1] = 4 的紀錄位置,從content開始擷取子字符串返回 就是 [0,4) 包含0但不包含索引為4的位置 如果再次執行 find(),仍然按上面分析來執行 */ while (matcher.find()){ // 小結 // 1. 如果正則表達式有() 即分組 // 2. 取出匹配的字符串規則如下 // 3. group(0) 表示匹配到的子字符串 // 4. group(1) 表示匹配到的子字符串的第一組字串 // 5. group(2) 表示匹配到的子字符串的第二組字串 // 6. ...但是分組不能越界 System.out.println("找到: " + matcher.group(0)); System.out.println("第一組(): " + matcher.group(1)); System.out.println("第二組(): " + matcher.group(2)); } } } ``` # 正則表達式基本語法 ![](https://i.imgur.com/AXPBgwV.png) # 元字符 ![](https://i.imgur.com/w9ajUF5.jpg) ![](https://i.imgur.com/LqIdfr6.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示轉義字符的使用 */ public class RegExp02 { public static void main(String[] args) { String content = "abc$(a.bc(123("; // 匹配( => \\( // 匹配. => \\. String regStr = "\\("; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println(matcher.group(0)); } } } ``` ## 字符匹配符 ![](https://i.imgur.com/YG8H0hT.png) ![](https://i.imgur.com/BakWYox.jpg) ![](https://i.imgur.com/4JCPVJ2.jpg) ![](https://i.imgur.com/ZNm2If1.png) ![](https://i.imgur.com/GwnfODz.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示字符匹配符的使用 */ public class RegExp03 { public static void main(String[] args) { String content = "a11c 8ab.c_ABC\t@"; // String regStr = "[0-9]"; // 匹配0-9之間任意一個字符 // String regStr = "[a-z]"; // 匹配a-z之間任意一個字符 // String regStr = "[A-Z]"; // 匹配A-Z之間任意一個字符 // String regStr = "(?i)abc"; // 匹配 abc 字符串[不區分大小寫] // String regStr = "abc"; // 匹配 abc 字符串[默認區分大小寫] // String regStr = "[^a-z]"; // 匹配不在 a-z 之間任意一個字符,受 Pattern.CASE_INSENSITIVE 影響 // String regStr = "[^0-9]"; // 匹配不在 0-9之間任意一個字符 // String regStr = "[abcd]"; // 匹配在 abcd中任意一個字符 // String regStr = "\\D"; // 匹配不在 0-9的任意一個字符 // String regStr = "\\w"; // 匹配大小寫英文字母、數字、下划線 // String regStr = "\\W"; // 匹配等價於 [^a-zA-Z0-9_] // \\s 匹配任何空白字符(空格、製表符等 // String regStr = "\\s"; // \\S 匹配任何非空白字符,和 \\s 相反 // String regStr = "\\S"; // . 匹配出 \n 之外的所有字符,如果要匹配 .本身 則需要使用 \\. // String regStr = "\\."; String regStr = "."; // Pattern pattern = Pattern.compile(regStr); // 說明 // 1. 當創建 Pattern 對象時,指定 Pattern.CASE_INSENSITIVE,表示匹配是不區分字母大小寫 Pattern pattern = Pattern.compile(regStr/*, Pattern.CASE_INSENSITIVE*/); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("找到 " + matcher.group(0)); } } } ``` ## 選擇匹配符 ![](https://i.imgur.com/VmH0NIO.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 選擇匹配符 */ public class RegExp04 { public static void main(String[] args) { String content = "hanshunping 韓 寒冷"; String regStr = "han|韓|寒"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("找到: " + matcher.group(0)); } } } ``` ## 限定符 ![](https://i.imgur.com/yFtc1hr.jpg) ![](https://i.imgur.com/Gsa1n6N.png) ![](https://i.imgur.com/1uLo8fh.png) ![](https://i.imgur.com/q4S9yuz.png) ![](https://i.imgur.com/U7SvnA1.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示限定符的使用 */ public class RegExp05 { public static void main(String[] args) { String content = "a21111111aaaaaahello"; // a{3}, 1{4}, (\\d){2} // String regStr = "a{3}"; // 匹配 aaa // String regStr = "1{4}"; // 匹配 1111 // String regStr = "\\d{2}"; // 表示匹配 兩位的任意數字字符 // a{3,4}, 1{4,5}, \\d{2,5} // 細節: java匹配默認貪婪匹配,即盡可能匹配多的 // String regStr = "a{3,4}"; // 匹配 aaaa, aaa // String regStr = "1{4,5}"; // 匹配 11111, 1111 // String regStr = "\\d{2,5}"; // 匹配 任意兩位到五位的數字 // 1+ // String regStr = "1+"; // 匹配 任意一至多個 1 // String regStr = "\\d+"; // 匹配 任意一至多個 數字 // 1* // String regStr = "1*"; // 匹配 任意零至多個 1 // 演示?的使用,遵守貪婪匹配 String regStr = "a1?"; // 匹配 a1 或者 a Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("找到: " + matcher.group(0)); } } } ``` ## 定位符 ![](https://i.imgur.com/V7YIf78.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示定位符的使用 */ public class RegExp06 { public static void main(String[] args) { // String content = "123-abc"; // 以至少1個數字開頭,後接任意個小寫字母的字符串 // String regStr = "^[0-9]+[a-z]*"; // 以至少1個數字開頭,必須以至少一個小寫字母結束 // String regStr = "^[0-9]+\\-[a-z]+$"; String content = "hanshunping sphan nnhan"; // 表示匹配邊界的han [這裡的邊界是指: 被匹配的字符串最後,也可以是空格的子字符串後面] // String regStr = "han\\b"; // 和 \\b 含意相反 String regStr = "han\\B"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.println("找到: " + matcher.group(0)); } } } ``` ## 分組組合和反向引用符 ![](https://i.imgur.com/2395Mqt.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 分組 */ public class RegExp07 { public static void main(String[] args) { String content = "hanshunping s7789 nn1189han"; // 下面是非命名的分組 // 說明 // 1. matcher.group(0) 得到匹配的字符串 // 2. matcher.group(1) 得到匹配到的字符串的第一個分組內容 // 3. matcher.group(2) 得到匹配到的字符串的第二個分組內容 // String regStr = "(\\d\\d)(\\d\\d)"; // 匹配4個數字的字符串 // 命名分組: 即可以給分組命名 String regStr = "(?<g1>\\d\\d)(?<g2>\\d\\d)"; // 匹配4個數字的字符串 Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.println("找到: " + matcher.group(0)); System.out.println("組1: " + matcher.group(1)); System.out.println("組1[由組名]: " + matcher.group("g1")); System.out.println("組2: " + matcher.group(2)); System.out.println("組2[由組名]: " + matcher.group("g2")); } } } ``` ![](https://i.imgur.com/jI4Lna0.jpg) ![](https://i.imgur.com/Lg21DB7.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示非捕獲分組,語法比較奇怪 */ public class RegExp08 { public static void main(String[] args) { String content = "hello韓順平教育 jack韓順平老師 韓順平同學hello"; // 找到 韓順平教育 韓順平老師 韓順平同學子字符串 // String regStr = "韓順平教育|韓順平老師|韓順平同學"; // 上面的寫法可以等價非捕獲分組,注意: 不能 matcher.group(1) String regStr = "韓順平(?:教育|老師|同學)"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.println("找到: " + matcher.group(0)); } System.out.println("=========="); // 找到 韓順平 這個關鍵字,但是要求只是查找韓順平教育 和韓順平老師中包含有的韓順平 // 下面也是非捕獲分組,不能使用 matcher.group(1) regStr = "韓順平(?=教育|老師)"; pattern = Pattern.compile(regStr); matcher = pattern.matcher(content); while (matcher.find()) { System.out.println("找到: " + matcher.group(0)); } System.out.println("=========="); // 找到 韓順平 這個關鍵字,但是要求只是查找不是 (韓順平教育 和韓順平老師)中包含有的韓順平 // 下面也是非捕獲分組,不能使用 matcher.group(1) regStr = "韓順平(?!教育|老師)"; pattern = Pattern.compile(regStr); matcher = pattern.matcher(content); while (matcher.find()) { System.out.println("找到: " + matcher.group(0)); } } } ``` ## 特殊字符 (非貪婪匹配) ![](https://i.imgur.com/B0RcEi7.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 非貪婪匹配 */ public class RegExp09 { public static void main(String[] args) { String content = "hello111111 ok"; // String regStr = "\\d+"; // 默認是貪婪匹配 String regStr = "\\d+?"; // 非貪婪匹配 Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.println("找到: " + matcher.group(0)); } } } ``` # 應用實例 ![](https://i.imgur.com/rj1iQCV.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 正則表達式的應用實例 */ public class RegExp10 { public static void main(String[] args) { String content = "韓順平教育"; // 漢字 String regStr = "^[\u0391-\uffe5]+$"; Pattern compile = Pattern.compile(regStr); Matcher matcher = compile.matcher(content); if(matcher.find()){ System.out.println("Satisfied"); } else { System.out.println("Not Statisfied"); } System.out.println("================="); // 郵政編碼,要求: 是1-9開頭的一個六位數,比如: 123890 content = "123890"; regStr = "^[1-9]\\d{5}"; compile = Pattern.compile(regStr); matcher = compile.matcher(content); if(matcher.find()){ System.out.println("Satisfied"); } else { System.out.println("Not Statisfied"); } System.out.println("================="); // QQ號碼,要求: 是1-9開頭的一個(5位數-10位數),比如: 12389, 1345687, 187698765 content = "123890"; regStr = "^[1-9]\\d{4,9}"; compile = Pattern.compile(regStr); matcher = compile.matcher(content); if(matcher.find()){ System.out.println("Satisfied"); } else { System.out.println("Not Statisfied"); } System.out.println("================="); // 手機號碼,要求: 必須以13,14,15,18開頭的11位數,比如: 13588889999 content = "13588889999"; regStr = "^1[3|4|5|8]\\d{9}"; compile = Pattern.compile(regStr); matcher = compile.matcher(content); if(matcher.find()){ System.out.println("Satisfied"); } else { System.out.println("Not Statisfied"); } } } ``` ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示正則表達式的使用 */ public class RegExp11 { public static void main(String[] args) { String content = "https://www.bilibili.com/video/BV1fh411y7R8/?from=search&seid=1831060912083761326"; /** * 思路 * 1. 先確定url的開始部分 https:// | http:// * 2. 然後通過 ([\\w-]+\\.)+[\\w-]+ 匹配 www.bilibili.com * 3. /video/BV1fh411y7R8/?from=search&seid=1831060912083761326 */ String regStr = "^((http|https)://)?([\\w-]+\\.)+[\\w-]+(\\/[\\w-?=&/%.#]*)?$"; // 注意: [. ? *]表示匹配就是.本身 Pattern compile = Pattern.compile(regStr); Matcher matcher = compile.matcher(content); if(matcher.find()){ System.out.println("Satisfied"); } else { System.out.println("Not Statisfied"); } } } ``` ## 結巴去重 ![](https://i.imgur.com/0F70Bgx.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 */ public class RegExp13 { public static void main(String[] args) { String content = "我....我要....學學學學....編程java!"; // 1. 去掉所有的. Pattern pattern = Pattern.compile("\\."); Matcher matcher = pattern.matcher(content); content = matcher.replaceAll(""); System.out.println("content=" + content); // 2. 去掉重複的字 我我要學學學學編程java! // 思路: // (1) 使用(.)\\1+ pattern = Pattern.compile("(.)\\1+"); matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("找到=" + matcher.group(0)); } // (2) 使用反向引用 $1 來替換匹配到的內容 content = matcher.replaceAll("$1"); System.out.println("content=" + content); //3. 使用一條語句去掉重複的字 content = "我我要學學學學編程java!"; content = Pattern.compile("(.)\\1+").matcher(content).replaceAll("$1"); System.out.println(content); } } ``` # 三個常用類 ![](https://i.imgur.com/c8UlOKT.jpg) ## Pattern ![](https://i.imgur.com/nMUwHug.png) ```java= package com.hspedu.regexp; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 演示matches方法,用於整體匹配,在驗證輸入的字符串是否滿足條件 */ public class PatternMethod { public static void main(String[] args) { String content = "hello abc hello, 韓順平教育"; // String regStr = "hello"; // false String regStr = "hello.*"; // true boolean matches = Pattern.matches(regStr, content); System.out.println("整體匹配= " + matches); } } ``` ## Matcher ![](https://i.imgur.com/3UoJIkc.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * Matcher 類的常用方法 */ public class MatcherMethod { public static void main(String[] args) { String content = "hello edu jack tom hello smith hello"; String regStr = "hello"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("=========="); System.out.println(matcher.start()); System.out.println(matcher.end()); System.out.println("找到: " + content.substring(matcher.start(), matcher.end())); } // 整體匹配,常用於,去校驗字符串是否滿足某個規則 System.out.println("整體匹配= " + matcher.matches()); // false } } ``` ![](https://i.imgur.com/rOf9B4M.png) ```java= @Test public void testSub() { String content = "hspedukkkhello,yyyzz123uafdahspedulkdfaxvfi;pvjv"; String regStr = "hspedu"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); // 如果content有hspedu,替換成韓順平教育 // 注意: 返回的字符串才是替換後的字符串,原來了content不變化 String newContent = matcher.replaceAll("韓順平教育"); System.out.println(content); System.out.println(newContent); } ``` ## PatternSyntaxException - PatternSyntaxException 是一個非強制異常類,他表示一個正則表達式中的語法錯誤 # 分組、捕獲、反向引用 ![](https://i.imgur.com/ePdRvTG.png) ![](https://i.imgur.com/Subw8q6.jpg) ![](https://i.imgur.com/lmol8sT.jpg) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 * 反向引用 */ public class RegExp12 { public static void main(String[] args) { String content = "h1234el9876lo33333 j12321-333999111a1551ck14 tom11 jack22 yyy12345 xxx"; // 匹配兩個連續相同數字 // String regStr = "(\\d)\\1"; // 匹配五個連續相同數字 // String regStr = "(\\d)\\1{4}"; // 匹配個位與千位相同,百位與十位相同的數字 // String regStr = "(\\d)(\\d)\\2\\1"; /** * 請在字符串中搜索商品編號,形式如:12321-333999111 這樣的號碼, * 要求滿足前面是一個五位數,然後是一個 - 號,然後是一個九位數,連續的每三位要相同 */ String regStr = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("找到: "+matcher.group(0)); } } } ``` # String 類中使用正則表達式 ![](https://i.imgur.com/yImepxT.jpg) ![](https://i.imgur.com/MW4Bq4F.png) ![](https://i.imgur.com/LPMT9QB.png) ```java= package com.hspedu.regexp; /** * @author Jerry Wang * @version 1.0 */ public class StringReg { public static void main(String[] args) { String content = "2000年5月,JDK1.3、JDK1.4和J2SE1.3相继发" + "Java创始人之一:詹姆斯·高斯林布,几周后其获得了Apple公司" + "Mac OS X的工业标准的支持。2001年9月24日,J2EE1.3发布。2002" + "年2月26日,J2SE1.4发布。自此Java的计算能力有了大幅提升"; // 使用正則表達式方式,將JDK1.3 和 JDK 1.4 替換成 JDK content = content.replaceAll("JDK1\\.3|JDK1\\.4", "JDK"); System.out.println(content); // ============================================================= // 要求驗證一個手機號,要求必須是以138 139開頭 content = "13888889999"; if (content.matches("1(38|39)\\d{8}")){ System.out.println("驗證成功"); } else { System.out.println("驗證失敗"); } // ============================================= // 要求按照 # 或者 - 或者 ~ 或者 數字來分割 content = "hello#abc-jack12smith~北京"; String[] split = content.split("#|-|~|\\d+"); for (String s : split) { System.out.println(s); } } } ``` # 本章作業 ## 1 ![](https://i.imgur.com/4RdOPC2.png) ```java "^[\\w-]+@([a-zA-z]+\\.)+[a-zA-z]+$" /** String 的 matches 源碼: public boolean matches(String regex) { return Pattern.matches(regex, this); } Pattern public static boolean matches(String regex, CharSequence input) { Pattern p = Pattern.compile(regex); Matcher m = p.matcher(input); return m.matches(); } Matcher類 match Attempts to match the "entire region" against the pattern. public boolean matches() { return match(from, ENDANCHOR); } */ ``` ## 2 ![](https://i.imgur.com/r0Lp4sL.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 */ public class Homework02 { public static void main(String[] args) { String content = "01.89"; String regStr = "^[-+]?([1-9]\\d*|0)(\\.\\d+)?$"; if (content.matches(regStr)){ System.out.println("Satisfied"); } else { System.out.println("Not Satisfied"); } } } ``` ## 3 ![](https://i.imgur.com/3Cyknt5.png) ```java= package com.hspedu.regexp; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author Jerry Wang * @version 1.0 */ public class Homework03 { public static void main(String[] args) { String content = "http://www.sohu.com:8080/abc/yyy////xxx/index.htm"; String regStr = "^([a-zA-z]+)://([a-zA-z.]+):(\\d+)[\\w-/]*/([\\w.]+)$"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); if (matcher.matches()){ // 整體匹配,如果匹配成功,可以通過group(x),獲取對應分組的內容 System.out.println("整體匹配=" + matcher.group(0)); System.out.println("協議=" + matcher.group(1)); System.out.println("域名=" + matcher.group(2)); System.out.println("端口=" + matcher.group(3)); System.out.println("文件=" + matcher.group(4)); } else { System.out.println("沒有匹配成功"); } } } ``` # 正則表達式大全 (看Markdown部分) [連結](https://blog.csdn.net/zyc88888/article/details/98479629) ## 一、校验数字的表达式 1. 数字:^[0-9]*$ 2. n位的数字:^\d{n}$ 3. 至少n位的数字:^\d{n,}$ 4. m-n位的数字:^\d{m,n}$ 5. 零和非零开头的数字:^(0|[1-9][0-9]*)$ 6. 非零开头的最多带两位小数的数字:^([1-9][0-9]*)+(.[0-9]{1,2})?$ 7. 带1-2位小数的正数或负数:^(\-)?\d+(\.\d{1,2})?$ 8. 正数、负数、和小数:^(\-|\+)?\d+(\.\d+)?$ 9. 有两位小数的正实数:^[0-9]+(.[0-9]{2})?$ 10. 有1~3位小数的正实数:^[0-9]+(.[0-9]{1,3})?$ 11. 非零的正整数:^[1-9]\d*$ 或 ^([1-9][0-9]*){1,3}$ 或 ^\+?[1-9][0-9]*$ 12. 非零的负整数:^\-[1-9][]0-9"*$ 或 ^-[1-9]\d*$ 13. 非负整数:^\d+$ 或 ^[1-9]\d*|0$ 14. 非正整数:^-[1-9]\d*|0$ 或 ^((-\d+)|(0+))$ 15. 非负浮点数:^\d+(\.\d+)?$ 或 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$ 16. 非正浮点数:^((-\d+(\.\d+)?)|(0+(\.0+)?))$ 或 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$ 17. 正浮点数:^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ 或 ^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$ 18. 负浮点数:^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$ 或 ^(-(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*)))$ 19. 浮点数:^(-?\d+)(\.\d+)?$ 或 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$ ## 二、校验字符的表达式 1. 汉字:^[\u4e00-\u9fa5]{0,}$ 2. 英文和数字:^[A-Za-z0-9]+$ 或 ^[A-Za-z0-9]{4,40}$ 3. 长度为3-20的所有字符:^.{3,20}$ 4. 由26个英文字母组成的字符串:^[A-Za-z]+$ 5. 由26个大写英文字母组成的字符串:^[A-Z]+$ 6. 由26个小写英文字母组成的字符串:^[a-z]+$ 7. 由数字和26个英文字母组成的字符串:^[A-Za-z0-9]+$ 8. 由数字、26个英文字母或者下划线组成的字符串:^\w+$ 或 ^\w{3,20}$ 9. 中文、英文、数字包括下划线:^[\u4E00-\u9FA5A-Za-z0-9_]+$ 10. 中文、英文、数字但不包括下划线等符号:^[\u4E00-\u9FA5A-Za-z0-9]+$ 或 ^[\u4E00-\u9FA5A-Za-z0-9]{2,20}$ 11. 可以输入含有^%&',;=?$\"等字符:[^%&',;=?$\x22]+ 12 禁止输入含有~的字符:[^~\x22]+ ## 其它: .*匹配除 \n 以外的任何字符。 /[\u4E00-\u9FA5]/ 汉字 /[\uFF00-\uFFFF]/ 全角符号 /[\u0000-\u00FF]/ 半角符号 ## 三、特殊需求表达式 1. Email地址:^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ 2. 域名:[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(/.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+/.? 3. InternetURL:[a-zA-z]+://[^\s]* 或 ^http://([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?$ 4. 手机号码:^(13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\d{8}$ 5. 电话号码("XXX-XXXXXXX"、"XXXX-XXXXXXXX"、"XXX-XXXXXXX"、"XXX-XXXXXXXX"、"XXXXXXX"和"XXXXXXXX):^(\(\d{3,4}-)|\d{3.4}-)?\d{7,8}$ 6. 国内电话号码(0511-4405222、021-87888822):\d{3}-\d{8}|\d{4}-\d{7} 7. 身份证号(15位、18位数字):^\d{15}|\d{18}$ 8. 短身份证号码(数字、字母x结尾):^([0-9]){7,18}(x|X)?$ 或 ^\d{8,18}|[0-9x]{8,18}|[0-9X]{8,18}?$ 9. 帐号是否合法(字母开头,允许5-16字节,允许字母数字下划线):^[a-zA-Z][a-zA-Z0-9_]{4,15}$ 10. 密码(以字母开头,长度在6~18之间,只能包含字母、数字和下划线):^[a-zA-Z]\w{5,17}$ 11. 强密码(必须包含大小写字母和数字的组合,不能使用特殊字符,长度在8-10之间):^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}$ 12. 日期格式:^\d{4}-\d{1,2}-\d{1,2} 13. 一年的12个月(01~09和1~12):^(0?[1-9]|1[0-2])$ 14. 一个月的31天(01~09和1~31):^((0?[1-9])|((1|2)[0-9])|30|31)$ 15. 钱的输入格式: 16. 1.有四种钱的表示形式我们可以接受:"10000.00" 和 "10,000.00", 和没有 "分" 的 "10000" 和 "10,000":^[1-9][0-9]*$ 17. 2.这表示任意一个不以0开头的数字,但是,这也意味着一个字符"0"不通过,所以我们采用下面的形式:^(0|[1-9][0-9]*)$ 18. 3.一个0或者一个不以0开头的数字.我们还可以允许开头有一个负号:^(0|-?[1-9][0-9]*)$ 19. 4.这表示一个0或者一个可能为负的开头不为0的数字.让用户以0开头好了.把负号的也去掉,因为钱总不能是负的吧.下面我们要加的是说明可能的小数部分:^[0-9]+(.[0-9]+)?$ 20. 5.必须说明的是,小数点后面至少应该有1位数,所以"10."是不通过的,但是 "10" 和 "10.2" 是通过的:^[0-9]+(.[0-9]{2})?$ 21. 6.这样我们规定小数点后面必须有两位,如果你认为太苛刻了,可以这样:^[0-9]+(.[0-9]{1,2})?$ 22. 7.这样就允许用户只写一位小数.下面我们该考虑数字中的逗号了,我们可以这样:^[0-9]{1,3}(,[0-9]{3})*(.[0-9]{1,2})?$ 23 8.1到3个数字,后面跟着任意个 逗号+3个数字,逗号成为可选,而不是必须:^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(.[0-9]{1,2})?$ 24. 备注:这就是最终结果了,别忘了"+"可以用"*"替代如果你觉得空字符串也可以接受的话(奇怪,为什么?)最后,别忘了在用函数时去掉去掉那个反斜杠,一般的错误都在这里 25. xml文件:^([a-zA-Z]+-?)+[a-zA-Z0-9]+\\.[x|X][m|M][l|L]$ 26. 中文字符的正则表达式:[\u4e00-\u9fa5] 27. 双字节字符:[^\x00-\xff] (包括汉字在内,可以用来计算字符串的长度(一个双字节字符长度计2,ASCII字符计1)) 28. 空白行的正则表达式:\n\s*\r (可以用来删除空白行) 29. HTML标记的正则表达式:<(\S*?)[^>]*>.*?</\1>|<.*? /> (网上流传的版本太糟糕,上面这个也仅仅能部分,对于复杂的嵌套标记依旧无能为力) 30. 首尾空白字符的正则表达式:^\s*|\s*$或(^\s*)|(\s*$) (可以用来删除行首行尾的空白字符(包括空格、制表符、换页符等等),非常有用的表达式) 31. 腾讯QQ号:[1-9][0-9]{4,} (腾讯QQ号从10000开始) 32. 中国邮政编码:[1-9]\d{5}(?!\d) (中国邮政编码为6位数字) 33. IP地址:\d+\.\d+\.\d+\.\d+ (提取IP地址时有用) 34. IP地址:((?:(?:25[0-5]|2[0-4]\\d|[01]?\\d?\\d)\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d?\\d)) 35. IP-v4地址:\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b (提取IP地址时有用) 36. 校验IP-v6地址:(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])) 37. 子网掩码:((?:(?:25[0-5]|2[0-4]\\d|[01]?\\d?\\d)\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d?\\d)) 38. 校验日期:^(?:(?!0000)[0-9]{4}-(?:(?:0[1-9]|1[0-2])-(?:0[1-9]|1[0-9]|2[0-8])|(?:0[13-9]|1[0-2])-(?:29|30)|(?:0[13578]|1[02])-31)|(?:[0-9]{2}(?:0[48]|[2468][048]|[13579][26])|(?:0[48]|[2468][048]|[13579][26])00)-02-29)$(“yyyy-mm-dd“ 格式的日期校验,已考虑平闰年。) 39. 抽取注释:<!--(.*?)--> 40. 查找CSS属性:^\\s*[a-zA-Z\\-]+\\s*[:]{1}\\s[a-zA-Z0-9\\s.#]+[;]{1} 41. 提取页面超链接:(<a\\s*(?!.*\\brel=)[^>]*)(href="https?:\\/\\/)((?!(?:(?:www\\.)?'.implode('|(?:www\\.)?', $follow_list).'))[^" rel="external nofollow" ]+)"((?!.*\\brel=)[^>]*)(?:[^>]*)> 42. 提取网页图片:\\< *[img][^\\\\>]*[src] *= *[\\"\\']{0,1}([^\\"\\'\\ >]*) 43. 提取网页颜色代码:^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$ 44. 文件扩展名效验:^([a-zA-Z]\\:|\\\\)\\\\([^\\\\]+\\\\)*[^\\/:*?"<>|]+\\.txt(l)?$ 45. 判断IE版本:^.*MSIE [5-8](?:\\.[0-9]+)?(?!.*Trident\\/[5-9]\\.0).*$

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully