正規表示式原理及應用

Regular Expressions

講者:朱孝國 Email of Peter Ju

網址:http://peterju.notlong.com/webslide/re/

服務單位:勤益科技大學電算中心

Agenda

Section 1  :  Regular Expressions簡介

Section 2  :  RE的樣版符號介紹

Section 3  :  RE的各種實作

Regular Expressions簡介

什麼是 Regular Expressions?

通常以斜線(forward slashes)或引號(quote)將RE含括起來

RE的用途

RE的發展

使用RE的注意事項

學習RE到底值得否?

如何學好RE

RE的樣版符號介紹

您已經使用過RE了

當然,上面的例子都不算是真的RE,僅僅算是globbing的概念而已,在此的焦點在於二 者都有樣版比對的概念,但RE要強大的多。

範例1:比對指定的字串

/car/ : 比對car

Carl spilt his carton of orange juice on the carpet of his new car.

If he had taken more care when opening the carton he wouldn't have had
this annoying and disappointing accident.

Some car shampoo would, Carl hoped, make the carpet look as good as new.

練習

/y/          /yes/         
  1. y
  2. Y
  3. yes
  4. yy
  5. hey
  6. heyes
  7. x
  8. good
  1. yes
  2. Yes
  3. YES
  4. yes!
  5. yeS
  6. yesterday
  7. y e s
  8. yes? no! yes ....

範例2:比對出現在邊界的字串

/\<car\>/ : 比對出現在邊界的car

Carl spilt his carton of orange juice on the carpet of his new car.

If he had taken more care when opening the carton he wouldn't have had
this annoying and disappointing accident.

Some car shampoo would, Carl hoped, make the carpet look as good as new.

範例3:字串出現在結尾

/port$/ : 比對以port結束($)的字串

4.3.12. Jaz and ZIP Drives

* Jaz-Drive-HOWTO, Jaz-drive HOWTO
Updated: Jan 2000. Covers the configuration and use of the 1Gb and
2Gb Iomega Jaz drives under Linux.
* ZIP-Install, Installing Linux on ZIP disk using ppa ZIP Drive
Mini-Howto
Updated: Jan 1998. This document is only useful for those with the
printer port version of a ZIP drive who wish to have either a
portable or backup Linux system on a ZIP disk.
_________________________________________________________________

4.3.14. Modems

* Linmodem-HOWTO, Linmodem-Mini-HOWTO
Updated: Feb 2001. Describes Linmodem (winmodem hardware) support
under Linux.
* Modem-HOWTO, Modem HOWTO
Updated: Jun 2005. Help with selecting, connecting, configuring,
trouble-shooting, and understanding modems for a PC.

練習

/yes$/

  1. yes
  2. yesyes
  3. yes no
  4. no yes
  5. yesterday
  6. yes, it is.
  7. my eyes

範例4:字串出現在開頭

/^\s*port/ :比對以port開頭(^)的字串,且可能前置空白

4.3.12. Jaz and ZIP Drives

* Jaz-Drive-HOWTO, Jaz-drive HOWTO
Updated: Jan 2000. Covers the configuration and use of the 1Gb and
2Gb Iomega Jaz drives under Linux.
* ZIP-Install, Installing Linux on ZIP disk using ppa ZIP Drive
Mini-Howto
Updated: Jan 1998. This document is only useful for those with the
printer port version of a ZIP drive who wish to have either a
portable or backup Linux system on a ZIP disk.
_________________________________________________________________

4.3.14. Modems

* Linmodem-HOWTO, Linmodem-Mini-HOWTO
Updated: Feb 2001. Describes Linmodem (winmodem hardware) support
under Linux.
* Modem-HOWTO, Modem HOWTO
Updated: Jun 2005. Help with selecting, connecting, configuring,
trouble-shooting, and understanding modems for a PC.

練習

/^yes/

  1. yes
  2. Yes
  3. yes no
  4. no yes
  5. yey yes
  6. myes
  7. eyes

RE的表示法說明

RE 的特殊字元-1

RE的特殊字元-2

練習1

/[15]0*/          /10{2,4}1/         
  1. 1
  2. 6
  3. 16
  4. 50
  5. 600
  6. 650
  7. 1000
  8. 5000
  1. 101
  2. 1001
  3. 10001
  4. 100001
  5. 1000001

練習2

/[k-q].[^4-9]/     /[^a-k]+[0-9]?./
  1. 5am6a4
  2. p43
  3. amma
  4. aqk4
  5. bom1
  1. 4
  2. 54
  3. m0
  4. s4a
  5. pmq1

解答1

/[15]0*/          /10{2,4}1/         
  1. 1
  2. 6
  3. 16
  4. 50
  5. 600
  6. 650
  7. 1000
  8. 5000
  1. 101
  2. 1001
  3. 10001
  4. 100001
  5. 1000001

解答2

/[k-q].[^4-9]/     /[^a-k]+[0-9]?./
  1. 5am6a4
  2. p43
  3. amma
  4. aqk4
  5. bom1
  1. 4
  2. 54
  3. m0
  4. s4a
  5. pmq1

RE的各種實作

哪些工具可使用RE?

Unix Like OS下的工具

1.將檔案中所有字串 ``Regular Expression'' 或 ``Regular expression''換成``Regexp''

   以 vi 編輯該檔案,並在 vi 命令輸入模式下執行:

:1,$ s/Regular [Ee]xpression/Regexp/g

2.使用Perl將資料檔中的美式日期(月/日/年)轉換為澳洲日期(日/月/年)

perl -pe 's#\b(\d\d)/(\d\d)/(\d\d\d\d)\b#$2/$1/$3#' < 資料檔檔名

3.將檔案中以``From''或``from''為行首的資料列印出

awk '/[Ff]rom/ ' 資料檔檔名

4.去除檔案中的空白行

sed -e ``/^$/d'' 資料檔檔名

OpenOffice.org

Regular Expression of the OpenOffice.org

Word

Regular Expression of the Word

Dreamweaver

Regular Expression of the Dreamweaver8

Crimson Editor

Regular Expression of the Crimson Editor

Vbscript

******************************************
* 說 明:VFP的函數,呼叫Vbscript的正規表示式來比對使用者的輸入
* 流 程:將使用者輸入的值包裝後傳回,例: 1,3,5 => "1","3","5"
******************************************

Function RE
Parameter user_input
local oReg,oMtchColl,qry_str
qry_str=''
** 建立 Regular Expression(RE) 的物件
oReg = createobject("VBScript.RegExp")
** 設定比對樣式
oReg.Pattern = "\w+"
** 設定RE物件的屬性
oReg.IgnoreCase = .T. && 忽略大小寫
oReg.Global = .T. && True = 尋找所有符合的目標, False = 只要找到就停止, 預設為False
oMtchColl = oReg.Execute( user_input ) && 將 user_input 拿來比對樣式, 回傳符合的物件
For each Match in oMtchColl
* ? match.FirstIndex, match.Value, match.Length
qry_str=qry_str+'"'+match.Value+'"'+","
Endfor
qry_str=left(qry_str,len(qry_str)-1)
RETURN qry_str

Java

Regular Expression of the Java

原始碼下載(GPL 授權,感謝施彥任先生貢獻原始碼)

C#

using System;
using System.Text.RegularExpressions;
public class Test
{
   public static void Main ()
   {
     // Define a regular expression for currency values.
      Regex rx = new Regex(@"^-?\d+(\.\d{2})?$");
     // Define some test strings.
     string[] tests = {"-42", "19.99", "0.001", "100 USD"};
     // Check each test string against the regular expression.
     foreach (string test in tests)
     {
       if (rx.IsMatch(test))
       {          Console.WriteLine("{0} is a currency value.", test);        }
       else
       {          Console.WriteLine("{0} is not a currency value.", test);        }
     }

   }
}

 

語法檢測工具

Rex V 的線上 Regular Expression 語法 檢測工具( PHP PCRE / PHP Posix / Javascript)

WaterProof 的免費語法檢測工具Regular Expression Editor(PHP PCRE)

施彥任先生的開放源碼 Regular Expression 語法檢測工具(Java String / java.util.regex.* )

結語

參考網址