Tcl脚本语言教程 - 图文

更新时间:2024-04-28 18:31:01 阅读量: 综合文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Tcl教程

TCL语法

■脚本、命令和单词符号…………………………………………………………………………..2

■置换(substitution) ……………………………………………………………………………..2 ■注释……………………………………………………………………………………………………..5

变量

■简单变量……………………………………………………………………………………………….5

■数组……………………………………………………………………………………………………..6 ■相关命令……………………………………………………………………………………………….6

表达式

■操作数………………………………………………………………………………………………...7

■运算符和优先级…………………………………………………….………………………………7 ■数学函数………………………………………………………………………………………………8

List

■list命令……………………………………………………………………………………………….10 ■concat命令…………………………………………………………………………………………10 ■lindex命令………………………………………………………………………………………….11 ■llength命令………………………………………………………………………………….…….11 ■linsert命令………………………………………………………………………………..……….11

■lreplace命令……………………………………………………………………………………….11■lrange 命令…………………………………………………………………………………………11 ■lappend命令………………………………………………………………….……………………12 ■lsearch 命令………………………………………………………………………...……………12 ■lsort命令…………………………………………………………………………….………..……13 ■split命令……………………………………………………………………………………….……13 ■join命令………………………………………………………………………………….….………13

控制流

■if命令……………………………………………………….……………………………….………13

■循环命令:while 、for、 foreach……………………………………………….………14 ■eval命令…………………………………………………………………………………….………15 ■source命令………………………………………………………………………………….….…16

过程(procedure)

■过程定义和返回值…………………………………………………………………………………16

■局部变量和全局变量………………………………………………………………………….….17

■缺省参数和可变个数参数………………………………………………………………….……17 ■引用:upvar…………………………………………………………………………..………..…18

字符串操作

■format命令……………………………………………………………………………………….…19 ■scan命令…………………………………………………………………………….…………….…20

1

■regexp命令 ■regsub命令 ■string命令 文件访问

■文件名

■基本文件输入输出命令

■随机文件访问 ■当前工作目录

■文件操作和获取文件信息

错误和异常

■错误

■从TCL脚本中产生错误 ■使用catch捕获错误 ■其他异常

深入TCL

■查询数组中的元素 ■info命令

TCL语法 > 脚本、命令和单词符号

一个TCL脚本可以包含一个或多个命令。命令之间必须用换行符或分号隔开,下面的两个脚本都是合法的: set a 1 set b 2 或

set a 1;set b 2

TCL的每一个命令包含一个或几个单词,第一个单词代表命令名,另外的单词则是这个命令的参数,单词之间必须用空格或TAB键隔开。

TCL解释器对一个命令的求值过程分为两部分:分析和执行。在分析阶段,TCL 解释器运用规则把命令分成一个个独立的单词,同时进行必要的置换(substitution); 在执行阶段,TCL 解释器会把第一个单词当作命令名,并查看这个命令是否有定义,如果有定义就激活这个命令对应的C/C++过程,并把所有的单词作为参数传递给该命令过程,让命令过程进行处理。

TCL语法 > 置换(substitution)

注:在下面的所有章节的例子中,'%'为TCL的命令提示符,输入命令回车后,TCL会在接着的一行输出命令执行结果。'//'后面是我自己加上的说明,不是例子的一部分。 TCL解释器在分析命令时,把所有的命令参数都当作字符串看待,例如:

2

%set x 10 //定义变量x,并把x的值赋为10 10

%set y x+100 //y的值是x+100,而不是我们期望的110 x+100

上例的第二个命令中,x被看作字符串x+100的一部分,如果我们想使用x的值'10' ,就必须告诉TCL解释器:我们在这里期望的是变量x的值,而非字符'x'。怎么告诉TCL解释器呢,这就要用到TCL语言中提供的置换功能。

TCL提供三种形式的置换:变量置换、命令置换和反斜杠置换。每种置换都会导致一个或多个单词本身被其他的值所代替。置换可以发生在包括命令名在内的每一个单词中,而且置换可以嵌套。

■变量置换(variable subtitution)

变量置换由一个$符号标记,变量置换会导致变量的值插入一个单词中。例如: %set y $x+100 //y的值是10+100,这里x被置换成它的值10 10+100

这时,y的值还不是我们想要的值110,而是10+100,因为TCL解释器把10+100看成是一个字符串而不是表达式,y要想得到值110,还必须用命令置换,使得TCL会把10+100看成一个表达式并求值。

■命令置换(command substitution)

命令置换是由[]括起来的TCL命令及其参数,命令置换会导致某一个命令的所有或部分单词被另一个命令的结果所代替。例如: %set y [expr $x+100] 110

y的值是110,这里当TCL解释器遇到字符'['时,它就会把随后的expr作为一个命令名,从而激活与expr对应的C/C++过程,并把'expr'和变量置换后得到的'10+110'传递给该命令过程进行处理。

如果在上例中我们去掉[],那么TCL会报错。因为在正常情况下,TCL解释器只把命令行中的第一个单词作为看作命令,其他的单词都作为普通字符串处理,看作是命令的参数。 注意,[]中必须是一个合法的TCL脚本,长度不限。[]中脚本的值为最后一个命令的返回值,例如:

%set y [expr $x+100;set b 300] //y的值为300,因为set b 300的返回值为300 300

3

有了命令置换,实际上就表示命令之间是可以嵌套的,即一个命令的结果可以作为别的命令的参数。

■反斜杠置换(backslash substitution)

TCL语言中的反斜杠置换类似于C语言中反斜杠的用法,主要用于在单词符号中插入诸如换行符、空格、[、$等被TCL解释器当作特殊符号对待的字符。例如: set msg multiple\\ space //msg的值为multiple space。

如果没有'\\'的话,TCL会报错,因为解释器会把这里最后两个单词之间的空格认为是分隔符,于是发现set命令有多于两个参数,从而报错。加入了'\\'后,空格不被当作分隔符,'multiple space'被认为是一个单词(word)。又例如: %set msg money\\ \\$3333\\ \\nArray\\ a\\[2] //这个命令的执行结果为:money $3333 Array a[2]

这里的$不再被当作变量置换符。 TCL支持以下的反斜杠置换:

Backslash Sequence Replaced By \\a Audible alert (0x7) \\b Backspace (0x8) \\f Form feed (0xc) \\n Newline (0xa) \\r Carriage return (0xd) \\t Tab (0x9)

\\v Vertical tab (0xb)

\\ddd Octal value given by ddd (one, two, or three d's) \\xhh Hex value given by hh (any number of h's)

\\ newline space A single space character. 例如:

%set a \\x48 //对应 \\xhh

H //十六进制的48正好是72,对应H % set a \\110 //对应 \\ddd

H //八进制的110正好是72,对应H

%set a [expr \\ // 对应\\newline space,一个命令可以用\\newline转到下一行继续 2+3] 5

4

■双引号和花括号

除了使用反斜杠外,TCL提供另外两种方法来使得解释器把分隔符和置换符等特殊字符当作普通字符,而不作特殊处理,这就要使用双引号和花括号({})。

TCL解释器对双引号中的各种分隔符将不作处理,但是对换行符 及$和[]两种置换符会照常处理。例如: %set x 100 100

%set y \ 100 ddd

而在花括号中,所有特殊字符都将成为普通字符,失去其特殊意义,TCL解释器不会对其作特殊处理。

%set y {/n$x [expr 10+100]} /n$x [expr 10+100] TCL语言教程 > TCL语法 > 注释

TCL中的注释符是'#','#'和直到所在行结尾的所有字符都被TCL看作注释,TCL解释器对注释将不作任何处理。不过,要注意的是,'#'必须出现在TCL解释器期望命令的第一个字符出现的地方,才被当作注释。 例如:

%#This is a comment %set a 100 # Not a comment

wrong # args: should be \ %set b 101 ; # this is a comment 101

第二行中'#'就不被当作注释符,因为它出现在命令的中间,TCL解释器把它和后面的字符当作命令的参数处理,从而导致错误。而第四行的'#'就被作为注释,因为前一个命令已经用一个分号结束,TCL解释器期望下一个命令接着出现。现在在这个位置出现'#',随后的字符就被当作注释了。

变量 > 简单变量

一个TCL的简单变量包含两个部分:名字和值。名字和值都可以是任意字符串。例如一个名为 “1323 7&*: hdgg\的变量在TCL中都是合法的。不过为了更好的使用置换(substitution),变量名最好按C\\C++语言中标识符的命名规则命名。 TCL解释器在分析一个变量置换时,只

5

List > concat命令

语法:concat list ?list...?

这个命令把多个list合成一个list,每个list变成新list的一个元素。

List > lindex命令

语法:lindex list index

返回list的第index个(0-based)元素。例: % lindex {1 2 {3 4}} 2 3 4

List > llength命令

语法:llength list 返回list的元素个数。例 % llength {1 2 {3 4}} 3

List > linsert命令

语法:linsert list index value ?value...?

返回一个新串,新串是把所有的value参数值插入list的第index个(0-based)元素之前得到。例:

% linsert {1 2 {3 4}} 1 7 8 {9 10} 1 7 8 {9 10} 2 {3 4} List > lreplace命令

语法:lreplace list first last ?value value ...?

返回一个新串,新串是把list的第firs (0-based)t到第last 个(0-based)元素用所有的value参数替换得到的。如果没有value参数,就表示删除第first到第last个元素。例: % lreplace {1 7 8 {9 10} 2 {3 4}} 3 3 1 7 8 2 {3 4}

% lreplace {1 7 8 2 {3 4}} 4 4 4 5 6 1 7 8 2 4 5 6

11

List > lrange命令

语法:lrange list first last

返回list的第first (0-based)到第last (0-based)元素组成的串,如果last的值是end。就是从第first个直到串的最后。 例:

% lrange {1 7 8 2 4 5 6} 3 end 2 4 5 6 List > lappend命令

语法:lappend varname value ?value...?

把每个value的值作为一个元素附加到变量varname后面,并返回变量的新值,如果varname不存在,就生成这个变量。例: % lappend a 1 2 3 1 2 3 % set a 1 2 3 List > lsearch命令

语法:lsearch ?-exact? ?-glob? ?-regexp? list pattern

返回list中第一个匹配模式pattern的元素的索引,如果找不到匹配就返回-1。-exact、-glob、 -regexp是三种模式匹配的技术。-exact表示精确匹配;-glob的匹配方式和string match命令的匹配方式相同,将在后面第八节介绍string命令时介绍;-regexp表示正规表达式匹配,将在第八节介绍regexp命令时介绍。缺省时使用-glob匹配。例: % set a { how are you } how are you % lsearch $a y* 2

% lsearch $a y? -1

List > lsort命令

语法:lsort ?options? list

这个命令返回把list排序后的串。options可以是如下值: -ascii 按ASCII字符的顺序排序比较.这是缺省情况。 -dictionary 按字典排序,与-ascii不同的地方是:

1 2

(1)不考虑大小写

(2)如果元素中有数字的话,数字被当作整数来排序.

因此:bigBoy排在bigbang和bigboy之间, x10y 排在x9y和x11y之间. -integer 把list的元素转换成整数,按整数排序. -real 把list的元素转换成浮点数,按浮点数排序. -increasing 升序(按ASCII字符比较) -decreasing 降序(按ASCII字符比较)

-command command TCL自动利用command 命令把每两个元素一一比较,然后给出排序结果。

List > split命令

语法:split string ?splitChars?

把字符串string按分隔符splitChars分成一个个单词,返回由这些单词组成的串。如果splitChars

是一个空字符{},string被按字符分开。如果splitChars没有给出,以空格为分隔符。例: % split \ how are you

% split \ how are you

% split \ h o w { } a r e { } y o u List > join命令

语法:join list ?joinString?

join命令是命令的逆。这个命令把list的所有元素合并到一个字符串中,中间以joinString分开。缺省的joinString是空格。例: % join {h o w { } a r e { } y o u} {} how are you

% join {how are you} . how.are.you 控制流 > if命令

TCL中的控制流和C语言类似,包括if、while、for、foreach、switch、break、continue等命令。

语法: if test1 body1 ?elseif test2 body2 elseif.... ? ?else bodyn?

TCL先把test1当作一个表达式求值,如果值非0,则把body1当作一个脚本执行并返回所得

13

值,否则把test2当作一个表达式求值,如果值非0,则把body2当作一个脚本执行并返回所得值……。例如: if { $x>0 } { .....

}elseif{ $x==1 } { .....

}elseif { $x==2 } { .... }else{ ..... }

注意,上例中'{'一定要写在上一行,因为如果不这样,TCL 解释器会认为if命令在换行符处已结束,下一行会被当成新的命令,从而导致错误的结果。在下面的循环命令的书写中也要注意这个问题。书写中还要注意的一个问题是if 和{之间应该有一个空格,否则TCL解释器会把'if{'作为一个整体当作一个命令名,从而导致错误。

控制流 > 循环命令

循环命令包括while、for、foreach等。 ■while命令

语法为: while test body

参数test是一个表达式,body是一个脚本,如果表达式的值非0,就运行脚本,直到表达式为0才停止循环,此时while命令中断并返回一个空字符串。 例如:

假设变量 a 是一个链表,下面的脚本把a 的值复制到b: set b \

set i [expr [llength $a] -1] while { $i>=0}{ lappend b [lindex $a $i] incr i -1 } ■for命令

语法为: for init test reinit body

参数init是一个初始化脚本,第二个参数test是一个表达式,用来决定循环什么时候中断,第三个参数reinit是一个重新初始化的脚本,第四个参数body也是脚本,代表循环体。下例与上例作用相同:

1 4

set b \

for {set i [expr [llength $a] -1]} {$i>=0} {incr i -1} { lappend b [lindex $a $i] } ■foreach命令 这个命令有两种语法形式 1、 foreach varName list body

第一个参数varName是一个变量,第二个参数list 是一个表(有序集合),第三个参数body是循环体。每次取得链表的一个元素,都会执行循环体一次。 下例与上例作用相同: set b \ foreach i $a{

set b [linsert $b 0 $i] }

2、 foreach varlist1 list1 ?varlist2 list2 ...? Body

这种形式包含了第一种形式。第一个参数varlist1是一个循环变量列表,第二个参数是一个列表list1,varlist1中的变量会分别取list1中的值。body参数是循环体。 ?varlist2 list2 ...?表示可以有多个变量列表和列表对出现。例如: set x {}

foreach {i j} {a b c d e f} { lappend x $j $i }

这时总共有三次循环,x的值为\。 set x {}

foreach i {a b c} j {d e f g} { lappend x $i $j }

这时总共有四次循环,x的值为\。 set x {}

foreach i {a b c} {j k} {d e f g} { lappend x $i $j $k }

这时总共有三次循环,x的值为\。

15

■break和continue命令

在循环体中,可以用break和continue命令中断循环。其中break命令结束整个循环过程,并从循环中跳出,continue只是结束本次循环。 ■switch 命令

和C语言中switch语句一样,TCL中的switch命令也可以由if命令实现。只是书写起来较为烦琐。 switch命令的语法为: switch ? options? string { pattern body ? pattern body ...?}

第一个是可选参数options,表示进行匹配的方式。TCL支持三种匹配方式:-exact方式,-glob方式,-regexp方式,缺省情况表示-glob方式。-exact方式表示的是精确匹配,-glob方式的匹配方式和string match 命令的匹配方式相同(第八节介绍),-regexp方式是正规表达式的匹配方式(第八节介绍)。第二个参数string 是要被用来作测试的值,第三个参数是括起来的一个或多个元素对,例: switch $x { a -

b {incr t1} c {incr t2} default {incr t3} }

其中a的后面跟一个'-'表示使用和下一个模式相同的脚本。default表示匹配任意值。一旦switch命令 找到一个模式匹配,就执行相应的脚本,并返回脚本的值,作为switch命令的返回值。

控制流 > eval命令

eval命令是一个用来构造和执行TCL脚本的命令,其语法为: eval arg ?arg ...?

它可以接收一个或多个参数,然后把所有的参数以空格隔开组合到一起成为一个脚本,然后对这个脚本进行求值。例如: %eval set a 2 ;set b 4 4

控制流 > source命令so

source命令读一个文件并把这个文件的内容作为一个脚本进行求值。例如: source e:/tcl&c/hello.tcl

1 6

注意路径的描述应该和UNIX相同,使用'/'而不是'\\'。

过程(procedure) > 过程定义和返回值

TCL支持过程的定义和调用,在TCL中,过程可以看作是用TCL脚本实现的命令,效果与TCL的固有命令相似。我们可以在任何时候使用proc命令定义自己的过程,TCL中的过程类似于C中的函数。

TCL中过程是由proc命令产生的: 例如:

% proc add {x y } {expr $x+$y}

proc命令的第一个参数是你要定义的过程的名字,第二个参数是过程的参数列表,参数之间用空格隔开,第三个参数是一个TCL脚本,代表过程体。 proc生成一个新的命令,可以象固有命令一样调用: % add 1 2 3

在定义过程时,你可以利用return命令在任何地方返回你想要的值。 return命令迅速中断过程,并把它的参数作为过程的结果。例如: % proc abs {x} {

if {$x >= 0} { return $x } return [expr -$x] }

过程的返回值是过程体中最后执行的那条命令的返回值。

过程(procedure) > 局部变量和全局变量

对于在过程中定义的变量,因为它们只能在过程中被访问,并且当过程退出时会被自动删除,所以称为局部变量;在所有过程之外定义的变量我们称之为全局变量。TCL中,局部变量和全局变量可以同名,两者的作用域的交集为空:局部变量的作用域是它所在的过程的内部;全局变量的作用域则不包括所有过程的内部。这一点和C语言有很大的不同.

如果我们想在过程内部引用一个全局变量的值,可以使用global命令。例如: % set a 4 4

% proc sample { x } { global a incr a

return [expr $a+$x]

17

}

% sample 3 8 %set a 5

全局变量a在过程中被访问。在过程中对a的改变会直接反映到全局上。如果去掉语句global a,TCL会出错,因为它不认识变量a。 过程(procedure) > 缺省参数和可变个数参数TCL还提供三种特殊的参数形式:

首先,你可以定义一个没有参数的过程,例如: proc add {} { expr 2+3}

其次,可以定义具有缺省参数值的过程,我们可以为过程的部分或全部参数提供缺省值,如果调用过程时未提供那些参数的值,那么过程会自动使用缺省值赋给相应的参数。和C\\C++中具有缺省参数值的函数一样,有缺省值的参数只能位于参数列表的后部,即在第一个具有缺省值的参数后面的所有参数,都只能是具有缺省值的参数。 例如:

proc add {val1 {val2 2} {val3 3}}{ expr $val1+$val2+$val3 } 则:

add 1 //值为6 add 2 20 //值为25 add 4 5 6 //值为15

另外,TCL的过程定义还支持可变个数的参数,如果过程的最后一个参数是args, 那么就表示这个过程支持可变个数的参数调用。调用时,位于args以前的参数象普通参数一样处理,但任何附加的参数都需要在过程体中作特殊处理,过程的局部变量args将会被设置为一个列表,其元素就是所有附加的变量。如果没有附加的变量,args就设置成一个空串,下面是一个例子: proc add { val1 args } { set sum $val1 foreach i $args { incr sum $i }

return $sum }

1 8

则:

add 2 //值为2

add 2 3 4 5 6 //值为20 过程(procedure) > 引用:upvar

命令语法:upvar ?level? otherVar myVar ?otherVar myVar ...?

upvar命令使得用户可以在过程中对全局变量或其他过程中的局部变量进行访问。 upvar命令的第一个参数otherVar是我们希望以引用方式访问的参数的名字,第二个参数myVar 是这个过程中的局部变量的名字,一旦使用了upvar 命令把otherVar 和myVar 绑定,那么在过程中对局部变量myVar 的读写就相当于对这个过程的调用者中otherVar 所代表的局部变量的读写。下面是一个例子: % proc temp { arg } { upvar $arg b set b [expr $b+2] }

% proc myexp { var } { set a 4 temp a

return [expr $var+$a] } 则: % myexp 7 13

这个例子中,upvar 把$arg(实际上是过程myexp中的变量a)和过程temp中的变量b绑定,对b的读写就相当于对a的读写。

upvar命令语法中的level参数表示:调用upvar命令的过程相对于我们希望引用的变量myVar在调用栈中相对位置。例如: upvar 2 other x

这个命令使得当前过程的调用者的调用者中的变量other,可以在当前过程中利用x访问。缺省情况下,level的值为1,即当前过程(上例中的temp)的调用者(上例中的myexp)中的变量(上例中myexp的a)可以在当前过程中利用局部变量(上例中temp的b)访问。 如果要访问全局变量可以这样写: upvar #0 other x

那么,不管当前过程处于调用栈中的什么位置,都可以在当前过程中利用x访问全局变量other。

19

字符串操作 > format命令

因为TCL把所有的输入都当作字符串看待,所以TCL提供了较强的字符串操作功能,TCL中与字符串操作有关的命令有:string、format、regexp、regsub、scan等。 format命令

语法:format formatstring ?vlue value...?

format命令类似于ANSIC中的sprintf函数和MFC中CString类提供的Format成员函数。它按formatstring提供的格式,把各个value的值组合到formatstring中形成一个新字符串,并返回。例如: %set name john John

%set age 20 20

%set msg [format \ john is 20 years old 字符串操作 > scan命令

语法:scan string format varName ?varName ...?

scan命令可以认为是format命令的逆,其功能类似于ANSI C中的sscanf函数。它按format提供的格式分析string字符串,然后把结果存到变量varName中,注意除了空格和TAB键之外,string 和format中的字符和'%'必须匹配。例如: % scan \ 2 % set a 26 % set b 34

% scan \ 4

% puts [format \ the value of c is 12,d is 34,e is 56 ,f is 78

scan命令的返回值是匹配的变量个数。而且,我们发现,如果变量varName不存在的话,TCL会自动声明该变量。

字符串操作 > regexp命令

语法:regexp ?switchs? ?--? exp string ?matchVar?\\ ?subMatchVar subMatchVar...? regexp命令用于判断正规表达式exp是否全部或部分匹配字符串string,匹配返回1,否则0。

2 0

在正规表达式中,一些字符具有特殊的含义,下表一一列出,并给予了解释。 意义 匹配任意单个字符 表示从头进行匹配 表示从末尾进行匹配 匹配字符x,这可以抑制字符x的含义 匹配字符集合chars中给出的任意字符,如果chars中的第一个字符是^,表示匹配任意不在chars中的字符,chars的表示方法支持a-z之类的表示。 字符 . ^ $ \\x [chars] (regexp) * + ? 把regexp作为一个单项进行匹配 对*前面的项进行0次或多次匹配 对+前面的项进行1次或多次匹配 对?前面的项进行0次或1次匹配 regexp1|regexp2 匹配regexp1或regexp2中的一项 下面的一个例子是从《Tcl and Tk ToolKit》中摘下来的,下面进行说明: ^((0x)?[0-9a-fA-F]+|[0-9]+)$ 这个正规表达式匹配任何十六进制或十进制的整数。 两个正规表达式以|分开(0x)?[0-9a-fA-F]+和[0-9]+,表示可以匹配其中的任何一个,事实上前者匹配十六进制,后者匹配的十进制。 ^表示必须从头进行匹配,从而上述正规表达式不匹配jk12之类不是以0x或数字开头的串。 $表示必须从末尾开始匹配,从而上述正规表达式不匹配12jk之类不是数字或a-fA-F结尾的串。 下面以(0x)?[0-9a-fA-F]+ 进行说明,(0x)表示0x一起作为一项,?表示前一项(0x)可以出现0次或多次,[0-9a-fA-F]表示可以是任意0到9之间的单个数字或a到f或A到F之间的单个字母,+表示象前面那样的单个数字或字母可以重复出现一次或多次。 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} ab 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 0xabcd 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 12345 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 123j 0 21 如果regexp命令后面有参数matchVar和subMatchVar,则所有的参数被当作变量名,如果变量不存在,就会被生成。 regexp把匹配整个正规表达式的子字符串赋给第一个变量,匹配正规表达式的最左边的子表达式的子字符串赋给第二个变量,依次类推,例如: % regexp { ([0-9]+) *([a-z]+)} \ 1 % puts \ 100 apples ,100,apples regexp可以设置一些开关(switchs〕,来控制匹配结果: 开关 -nocase -indices 意义 匹配时不考虑大小写 改变各个变量的值,这使各个变量的值变成了对应的匹配子串在整个字符串中所处位置的索引。例如: % regexp -indices { ([0-9]+) *([a-z]+)} \total num word 1 % puts \ 9 20 ,10 12,15 20 正好子串“ 100 apples”的序号是9-20,\的序号是10-12,\的序号是15-20 -about -expanded -line 返回正则表达式本身的信息,而不是对缓冲区的解析。返回的是一个list,第一个元素是子表达式的个数,第二个元素开始存放子表达式的信息 启用扩展的规则,将空格和注释忽略掉,相当于使用内嵌语法(?x) 启用行敏感匹配。正常情况下^和$只能匹配缓冲区起始和末尾,对于缓冲区内部新的行是不能匹配的,通过这个开关可以使缓冲区内部新的行也可以被匹配。它相当于同时使用-linestop和-lineanchor 开关,或者使用内嵌语法(?n) -linestop -lineanchor -all -inline 启动行结束敏感开关。使^可以匹配缓冲区内部的新行。相当于内嵌语法(?p) 改变^和$的匹配行为,使可以匹配缓冲区内部的新行。相当于内嵌语法(?w) 进最大可能的匹配 Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression. Examples are: regexp -inline -- {\\w(\\w)} \2 2

=> {in n} regexp -all -inline -- {\\w(\\w)} \=> {in n li i ne e} -start index 强制从偏移为index开始的位置进行匹配。使用这个开关之后,^将不能匹配行起始位置,\\A将匹配字符串的index偏移位置。如果使用了-indices开关,则indices表示绝对位置,index表示输入字符的相对位置。 -- 表示这后面再没有开关(switchs〕了,即使后面有以'-'开头的参数也被当作正规表达式的一部分。 【TCL正则表达式规则详细说明】 ◆DESCRIPTION(描述) A regular expression describes strings of characters. It's a pattern that matches certain strings and doesn't match others. ◆DIFFERENT FLAVORS OF REs(和标准正则表达式的区别) Regular expressions, as defined by POSIX, come in two flavors: extended REs and basic REs. EREs are roughly those of the traditional egrep, while BREs are roughly those of the traditional ed. This implementation adds a third flavor, advanced REs, basically EREs with some significant extensions. This manual page primarily describes AREs. BREs mostly exist for backward compatibility in some old programs; they will be discussed at the end. POSIX EREs are almost an exact subset of AREs. Features of AREs that are not present in EREs will be indicated. ◆REGULAR EXPRESSION SYNTAX(语法) Tcl regular expressions are implemented using the package written by Henry Spencer, based on the 1003.2 spec and some (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description of regular expressions below is copied verbatim from his manual entry. An ARE is one or more branches, separated by `|', matching anything that matches any of the branches. A branch is zero or more constraints or quantified atoms, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string. A quantified atom is an atom possibly followed by a single quantifier. Without a

23

quantifier, it matches a match for the atom. The quantifiers, and what a so-quantified atom matches, are: 字符 * + ? {m} {m,} {m,n} *? +? 意义 a sequence of 0 or more matches of the atom a sequence of 1 or more matches of the atom a sequence of 0 or 1 matches of the atom a sequence of exactly m matches of the atom a sequence of m or more matches of the atom a sequence of m through n (inclusive) matches of the atom; m may not exceed n ?? non-greedy quantifiers, which match the same possibilities, but matches (see MATCHING) {m}? {m,}? prefer the smallest number rather than the largest number of {m,n}? The forms using { and } are known as bounds. The numbers m and n are unsigned decimal integers with permissible values from 0 to 255 inclusive. An atom is one of: 字符 (re) (?:re) () (?:) [chars] . \\k \\c { 意义 (where re is any regular expression) matches a match for re, with the match noted for possible reporting as previous, but does no reporting matches an empty string, noted for possible reporting matches an empty string, without reporting a bracket expression, matching any one of the chars (see BRACKET EXPRESSIONS for more detail) matches any single character where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g. \\\\ matches a backslash character where c is alphanumeric (possibly followed by other characters), an escape (AREs only), see ESCAPES below when followed by a character other than a digit, matches the left-brace character `{'; when followed by a digit, it is the beginning of a bound (see above) x where x is a single character with no other significance, matches that character. A constraint matches an empty string when specific conditions are met. A constraint may not be followed by a quantifier. The simple constraints are as follows; some more constraints are described later, under ESCAPES. 2 4 字符 ^ $ (?=re) (?!re) 意义 matches at the beginning of a line matches at the end of a line positive lookahead (AREs only), matches at any point where a substring matching re begins negative lookahead (AREs only), matches at any point where no substring matching re begins The lookahead constraints may not contain back references (see later), and all parentheses within them are considered non-capturing. An RE may not end with `\\'. ◆BRACKET EXPRESSIONS(预定义表达式) A bracket expression is a list of characters enclosed in `[]'. It normally matches any single character from the list (but see below). If the list begins with `^', it matches any single character (but see below) not from the rest of the list. If two characters in the list are separated by `-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. [0-9] in ASCII matches any decimal digit. Two ranges may not share an endpoint, so e.g. a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them. To include a literal ] or - in the list, the simplest method is to enclose it in [. and .] to make it a collating element (see below). Alternatively, make it the first character (following a possible `^'), or (AREs only) precede it with `\\'. Alternatively, for `-', make it the last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, make it a collating element or (AREs only) precede it with `\\'. With the exception of these, some combinations using [ (see next paragraphs), and escapes, all other special characters lose their special significance within a bracket expression. Within a bracket expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [. and .] stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression in a locale that has multi-character collating elements can thus match more than one character. So (insidiously), a bracket expression that starts with ^ can match multi-character collating elements even if none of them appear in the bracket expression! (Note: Tcl currently has no multi-character collating elements. This information is only for illustration.) For example, assume the collating sequence includes a ch multi-character collating element. Then the RE [[.ch.]]*c (zero or more ch's followed by c) matches the first

25

five characters of `chchcc'. Also, the RE [^c]b matches all of `chb' (because [^c] matches the multi-character ch). Within a bracket expression, a collating element enclosed in [= and =] is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were `[.' and `.]'.) For example, if o and ?are the members of an equivalence class, then `[[=o=]]', `[[=?]]', and `[o' are all synonymous. An equivalence class may not be an endpoint of a range. (Note: Tcl currently implements only the Unicode locale. It doesn't define any equivalence classes. The examples above are just illustrations.) Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters (not all collating elements!) belonging to that class. Standard character classes are: 字符 alpha upper lower digit xdigit alnum print blank space punct graph cntrl 意义 A letter. An upper-case letter. A lower-case letter. A decimal digit. A hexadecimal digit. An alphanumeric (letter or digit). An alphanumeric (same as alnum). A space or tab character. A character producing white space in displayed text. A punctuation character. A character with a visible representation. A control character. A locale may provide others. (Note that the current Tcl implementation has only one locale: the Unicode locale.) A character class may not be used as an endpoint of a range. There are two special cases of bracket expressions: the bracket expressions [[:<:]] and [[:>:]] are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character or an underscore (_). These special bracket expressions are deprecated; users of AREs should use constraint escapes instead (see below). ◆ESCAPES(转意字符) Escapes (AREs only), which begin with a \\ followed by an alphanumeric character, 2 6

come in several varieties: character entry, class shorthands, constraint escapes, and back references. A \\ followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a \\ followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, \\ is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.) Character-entry escapes (AREs only) exist to make it easier to specify non-printing and otherwise inconvenient characters in REs: 字符 \\a \\b \\B \\cX \\e \\f \\n \\r \\t \%uwxyz \\Ustuvwxyz \\v \\xhhh 意义 alert (bell) character, as in C backspace, as in C synonym for \\ to help reduce backslash doubling in some applications where there are multiple levels of backslash processing (where X is any character) the character whose low-order 5 bits are the same as those of X, and whose other bits are all zero the character whose collating-sequence name is `ESC', or failing that, the character with octal value 033 formfeed, as in C newline, as in C carriage return, as in C horizontal tab, as in C (where wxyz is exactly four hexadecimal digits) the Unicode character U+wxyz in the local byte ordering (where stuvwxyz is exactly eight hexadecimal digits) reserved for a somewhat-hypothetical Unicode extension to 32 bits vertical tab, as in C are all available. (where hhh is any sequence of hexadecimal digits) the character whose hexadecimal value is 0xhhh (a single character no matter how many hexadecimal digits are used). \\0 \\xy \\xyz the character whose value is 0 (where xy is exactly two octal digits, and is not a back reference (see below)) the character whose octal value is 0xy (where xyz is exactly three octal digits, and is not a back reference (see below)) the character whose octal value is 0xyz Hexadecimal digits are `0'-`9', `a'-`f', and `A'-`F'. Octal digits are `0'-`7'. The character-entry escapes are always taken as ordinary characters. For example, \\135 is ] in ASCII, but \\135 does not terminate a bracket expression. Beware, however, that some applications (e.g., C compilers) interpret such sequences

27

themselves before the regular-expression package gets to see them, which may require doubling (quadrupling, etc.) the `\\'. Class-shorthand escapes (AREs only) provide shorthands for certain commonly-used character classes: 缩写 \\d \\s \\w \\D \\S \\W 代表的完整表达式 [[:digit:]] [[:space:]] [[:alnum:]_] (note underscore) [^[:digit:]] [^[:space:]] [^[:alnum:]_] (note underscore) Within bracket expressions, `\\d', `\\s', and `\\w' lose their outer brackets, and `\\D', `\\S', and `\\W' are illegal. (So, for example, [a-c\\d] is equivalent to [a-c[:digit:]]. Also, [a-c\\D], which is equivalent to [a-c^[:digit:]], is illegal.) A constraint escape (AREs only) is a constraint, matching the empty string if specific conditions are met, written as an escape: 字符 \\A \\m \\M \\y \\Y \\Z \\m \\mnn 意义 matches only at the beginning of the string (see MATCHING, below, for how this differs from `^') matches only at the beginning of a word matches only at the end of a word matches only at the beginning or end of a word matches only at a point that is not the beginning or end of a word matches only at the end of the string (see MATCHING, below, for how this differs from `$') (where m is a nonzero digit) a back reference, see below (where m is a nonzero digit, and nn is some more digits, and the decimal value mnn is not greater than the number of closing capturing parentheses seen so far) a back reference, see below A word is defined as in the specification of [[:<:]] and [[:>:]] above. Constraint escapes are illegal within bracket expressions. A back reference (AREs only) matches the same string matched by the parenthesized subexpression specified by the number, so that (e.g.) ([bc])\\1 matches bb or cc but not `bc'. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions. 2 8

There is an inherent historical ambiguity between octal character-entry escapes and back references, which is resolved by heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e. the number is in the legal range for a back reference), and otherwise is taken as octal. ◆METASYNTAX(内嵌语法) In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available. Normally the flavor of RE being used is specified by application-dependent means. However, this can be overridden by a director. If an RE of any flavor begins with `***:', the rest of the RE is an ARE. If an RE of any flavor begins with `***=', the rest of the RE is taken to be a literal string, with all characters considered ordinary characters. An ARE may begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These supplement, and can override, any options specified by the application. The available option letters are: 字符 b c e i m n p q s t w x 意义 rest of RE is a BRE case-sensitive matching (usual default) rest of RE is an ERE case-insensitive matching (see MATCHING, below) historical synonym for n newline-sensitive matching (see MATCHING, below) partial newline-sensitive matching (see MATCHING, below) rest of RE is a literal string, all ordinary characters non-newline-sensitive matching (usual default) tight syntax (usual default; see below) inverse partial newline-sensitive matching (see MATCHING, below) expanded syntax (see below) Embedded options take effect at the ) terminating the sequence. They are available only at the start of an ARE, and may not be used later within it. In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available in all flavors of RE with the -expanded switch, or in AREs with the embedded x option. In the expanded syntax, white-space

29

characters are ignored and all characters between a # and the following newline (or the end of the RE) are ignored, permitting paragraphing and commenting a complex RE. There are three exceptions to that basic rule:

a white-space character or `#' preceded by `\\' is retained white space or `#' within a bracket expression is retained

white space and comments are illegal within multi-character symbols like the ARE `(?:' or the BRE `\\('

Expanded-syntax white-space characters are blank, tab, newline, and any character that belongs to the space character class.

Finally, in an ARE, outside bracket expressions, the sequence `(?#ttt)' (where ttt is any text not containing a `)') is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols like `(?:'. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead.

None of these metasyntax extensions is available if the application (or an initial ***= director) has specified that the user's input be treated as a literal string rather than as an RE.

◆MATCHING(匹配)

In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, its choice is determined by its preference: either the longest substring, or the shortest.

Most atoms, and all constraints, have no preference. A parenthesized RE has the same preference (possibly none) as the RE. A quantified atom with quantifier {m} or {m}? has the same preference (possibly none) as the atom itself. A quantified atom with other normal quantifiers (including {m,n} with m equal to n) prefers longest match. A quantified atom with other non-greedy quantifiers (including {m,n}? with m equal to n) prefers shortest match. A branch has the same preference as the first quantified atom in it which has a preference. An RE consisting of two or more branches connected by the | operator prefers longest match.

Subject to the constraints imposed by the rules for matching the whole RE, subexpressions also match the longest or shortest possible substrings, based on their preferences, with subexpressions starting earlier in the RE taking priority over ones starting later. Note that outer subexpressions thus take priority over their component subexpressions.

Note that the quantifiers {1,1} and {1,1}? can be used to force longest and shortest preference, respectively, on a subexpression or a whole RE.

3 0

本文来源:https://www.bwwdw.com/article/iazg.html

Top