最简明的Perl正则表达式入门教程,2页纸

更新时间:2023-04-21 00:41:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

RT

Search&Replace: substitution operator s///\W Any non-word character. Warning \w != \S

A Quick Guide To PERL Regular Expressions

This is a Quick reference Guide for PERL regular expressions (also known as regexps or regexes).

These tools are used to describe text as “motifs” or “patterns” for matching, quoting, substituting or translitterating. programming own regular expressions although the syntax might differ from language (Perl, C, Java, Python...) de ne Each its details to extensive changes. In this guide we will concentrate on the Perl regexp syntax, we assume that the reader has some preliminary knowledge of Perl programming.

Perl (NFA) uses element match a Traditional engine. This Nondeterministic means that it will Finite Automata the after positions. of the The motif engine to the choose input string, compare each the rst keeping track of matched.

greedy (i.e., longest possible match) quanti ers leftmost match have References

For you more information on Perl regexps and other Expressions”.

can refer to O’Reilly’s book “Mastering syntaxes Regular Examples:

The following sentence will be used in all our examples:

The ID sp:UBP5_RAT is similar to the rabit AC tr:Q12345

Motif nding: match operator m// EXPR =~ m/MOTIF/cgimosxEXPR =~ /MOTIF/cgimosxEXPR !~ m/MOTIF/cgimosxEXPR !~ /MOTIF/cgimosx

Examples: match any SwissProt ID for a rat protein

if ($ex =~ m/will match

\w{2,5}_RAT/) { print “Rat entry\n”; }

The ID sp:and as a result print Rat entry.UBP5_RAT is similar to the rabit AC tr:Q12345

Options

cg continue after a failure inglobal matches (matches all occurrences) /g

g i case insensitive

m multiline, allow o compile MOTIF only once

“^” and “$” to match with (\n)s single line, dot x

ignore whitespace and allow comments “.” matches new-line (\n)

“#” in MOTIF

EXPR =~ s/MOTIF/REPLACE/egimosxExample: correct typo for the word rabbit

$ex =~ s/Here is the content of $ex:

rabit/rabbit/g;

The ID sp:UBP5_RAT is similar to the rabbit AC tr:Q12345

Example: nd and tag any TrEMBL AC

$ex =~ s/Here is the content of $ex:

tr:/trembl_ac=/g;

The ID sp:UBP5_RAT is similar to the rabit AC trembl_ac=Q12345

Options

e evaluate REPLACE as an expressiong global matches (matches all occurrences)i case insensitive

m multiline, allow o compile MOTIF only once

“^” and “$” to match with (\n)s single line, dot ignore whitespace and allow comments “.” matches new-line (\nx

in MOTIF

)

“#”Quoting: quote and compile operator qr// EXPR =~ qr/MOTIF/imosx

Example: reuse of a precompiled regexp

$myregexp = qr/\w{2,5}_\w{2,5}/;

if ($ex =~ m/$myregexp/will match:

) { print “SwissProtID\n”; }

The ID sp:and as a result will print SwissProtID.UBP5_RAT is similar to the rabit AC tr:Q12345

Options

i case insensitive

m multiline, allow compile MOTIF only once

“^” and “$” to match with (\n)o s single line, dot x

ignore whitespace and allow comments “.” matches new-line (\n)

“#” in MOTIF

Character classes[...] Match any one character of a class[^ ...] Match any one character not in the bracket.Match any character (except newline [^\n]) in non

single-line mode (/s)

\d Any digit. Equivalent to [0..9] or [[:digit:]\D

Any non-digit.]

\s

Any whitespace. [ \t\s\n\r\f\v] or [[:space:]\S

Any non-whitespace.]\w Any word character. [a-zA-Z0-9_] or [[:alnum:_]]

POSIX Character class[alnum alpha ascii blank cntrl digit graph lower [:class:]] class can be any of:

print punct space upper xdigit

Special characters\a alert (bell)\b backspace\e escape\f form feed\n newline\r carriage return\t horizontal tabulation

\nnn octal nnn

\xnn hexadecimal nn\cX

control character X

Repetitions? Zero or one occurrence of the previous item.* Zero or more occurrences of the previous item.+ One or more occurrences of the previous item.

{n,m} Match at least previous item.

n times but no more than m times the {n,} Match {n} Match exactly n or more times {}?

Non-greedy match (i.e., match the shortest string)

n times

Anchors

^ or \A Match beginning of the string/line$ or \Z \z Match end of the string/lineEnd of string in any match mode\b Match word boundary\B

Match non-word boundary

Capture & Grouping

(...) Group several characters together for later use or

capture as a single unit

| Match either subexpressions (equivalent to “OR”)Example: match any database code in the list

$ex =~ m/will match:

(sp:|tr:|rs:)/g;

The ID sp:UBP5_RAT is similar to the rabit AC tr:Q12345

RT

\n

Back group reference. number nMatch that was the previously same as the matched captured in the same MOTIF.

$n

Substring of captured group n

Example: match several instances with back reference

$ex =~ m/will match:

(the).+\1/i;

The ID sp:UBP5_RAT is similar to the rabit AC tr:Q12345

Example: rename any tr:AC to trembl_AC= using a capture

$ex =~ s/will match:

tr:([[:alnum:]]{6})/trembl_AC=$1/gi;

The ID sp:UBPAC=Q12345

5_RAT is similar to the rabit AC trembl_Text-span modi ers\Q Quote following metacharacters until motif (allow the use of scalars in regexp)

\E or end of

\u Force next character to uppercase\lForce next character to lowecase\U Force all following characters to uppercase\L Force all following characters to lowercase\EEnd a span started with \Q, \U or \LExtended Regexp(?#...) Substring “...” is a comment

(?=...) Positive (e.g., allow overlapping matches in global mode)

lookahead. Match if exists next match

(?!...) Negative lookahead. Match if no next match(?<=...) Positive lookahead. Fixed length only.(?<!...) Negative lookahead. Fixed length only.(?imsx) Modify matching options

Transliteration: translate operator tr///

EXPR =~ tr/SEARCHLIST/REPLACELIST/cds

Transliteration is not - and does not use - a regular expression, but it is frequently associated with the regexp in PERL. Thus we decided to include it in this guide.

Example: reverse and complement a DNA sequence

$DNA = AAATATTTCATCGTACAT;$revcom = reverse $DNA;

$revcom =~ tr/ACGTacgt/TGCAtgca/;

The transliteration will produce the following:print($DNA); print($revcom); AAATATTTCATCGTACAT ATGTACGATGAAATATTT

Options

c complement REPLACELISTd delete non-replaced characters

s

single replace of duplicated characters

UniCode matches

Perl long 5.8 supports UniCode 3.2. However it would be too information see “Mastering Regular Expressions”.

to describe all the properties in details here. For more \p{PROP} Matches a UniCode property

\P{PROP} Matches anything but a UniCode property

This document was written and designed by Laurent Falquet and Vassilios Ioannidis from the Swiss EMBnet node and being distributed by P&PR Publications Committee of EMBnet.EMBnet bioinformatics - European Molecular centers situated primarily in Europe. Most countries have a support network of Biology bioinformatics Network support - is a national node which can provide training courses forms of help for users of bioinformatics software.

and other You can nd information about your national node from the EMBnet site:

/

A Quick Guide To PERL Regular Expressions

First edition © 2005

本文来源:https://www.bwwdw.com/article/spiq.html

Top