最简明的Perl正则表达式入门教程,2页纸
更新时间:2023-04-21 00:41:01 阅读量: 实用文档 文档下载
- perl 正则表达式推荐度:
- 相关推荐
RT
Search&Replace: substitution operator s///\W Any non-word character. Warning \w != \S
A Quick Guide To PERL Regular Expressions
This is a Quick reference Guide for PERL regular expressions (also known as regexps or regexes).
These tools are used to describe text as “motifs” or “patterns” for matching, quoting, substituting or translitterating. programming own regular expressions although the syntax might differ from language (Perl, C, Java, Python...) de ne Each its details to extensive changes. In this guide we will concentrate on the Perl regexp syntax, we assume that the reader has some preliminary knowledge of Perl programming.
Perl (NFA) uses element match a Traditional engine. This Nondeterministic means that it will Finite Automata the after positions. of the The motif engine to the choose input string, compare each the rst keeping track of matched.
greedy (i.e., longest possible match) quanti ers leftmost match have References
For you more information on Perl regexps and other Expressions”.
can refer to O’Reilly’s book “Mastering syntaxes Regular Examples:
The following sentence will be used in all our examples:
The ID sp:UBP5_RAT is similar to the rabit AC tr:Q12345
Motif nding: match operator m// EXPR =~ m/MOTIF/cgimosxEXPR =~ /MOTIF/cgimosxEXPR !~ m/MOTIF/cgimosxEXPR !~ /MOTIF/cgimosx
Examples: match any SwissProt ID for a rat protein
if ($ex =~ m/will match
\w{2,5}_RAT/) { print “Rat entry\n”; }
The ID sp:and as a result print Rat entry.UBP5_RAT is similar to the rabit AC tr:Q12345
Options
cg continue after a failure inglobal matches (matches all occurrences) /g
g i case insensitive
m multiline, allow o compile MOTIF only once
“^” and “$” to match with (\n)s single line, dot x
ignore whitespace and allow comments “.” matches new-line (\n)
“#” in MOTIF
EXPR =~ s/MOTIF/REPLACE/egimosxExample: correct typo for the word rabbit
$ex =~ s/Here is the content of $ex:
rabit/rabbit/g;
The ID sp:UBP5_RAT is similar to the rabbit AC tr:Q12345
Example: nd and tag any TrEMBL AC
$ex =~ s/Here is the content of $ex:
tr:/trembl_ac=/g;
The ID sp:UBP5_RAT is similar to the rabit AC trembl_ac=Q12345
Options
e evaluate REPLACE as an expressiong global matches (matches all occurrences)i case insensitive
m multiline, allow o compile MOTIF only once
“^” and “$” to match with (\n)s single line, dot ignore whitespace and allow comments “.” matches new-line (\nx
in MOTIF
)
“#”Quoting: quote and compile operator qr// EXPR =~ qr/MOTIF/imosx
Example: reuse of a precompiled regexp
$myregexp = qr/\w{2,5}_\w{2,5}/;
if ($ex =~ m/$myregexp/will match:
) { print “SwissProtID\n”; }
The ID sp:and as a result will print SwissProtID.UBP5_RAT is similar to the rabit AC tr:Q12345
Options
i case insensitive
m multiline, allow compile MOTIF only once
“^” and “$” to match with (\n)o s single line, dot x
ignore whitespace and allow comments “.” matches new-line (\n)
“#” in MOTIF
Character classes[...] Match any one character of a class[^ ...] Match any one character not in the bracket.Match any character (except newline [^\n]) in non
single-line mode (/s)
\d Any digit. Equivalent to [0..9] or [[:digit:]\D
Any non-digit.]
\s
Any whitespace. [ \t\s\n\r\f\v] or [[:space:]\S
Any non-whitespace.]\w Any word character. [a-zA-Z0-9_] or [[:alnum:_]]
POSIX Character class[alnum alpha ascii blank cntrl digit graph lower [:class:]] class can be any of:
print punct space upper xdigit
Special characters\a alert (bell)\b backspace\e escape\f form feed\n newline\r carriage return\t horizontal tabulation
\nnn octal nnn
\xnn hexadecimal nn\cX
control character X
Repetitions? Zero or one occurrence of the previous item.* Zero or more occurrences of the previous item.+ One or more occurrences of the previous item.
{n,m} Match at least previous item.
n times but no more than m times the {n,} Match {n} Match exactly n or more times {}?
Non-greedy match (i.e., match the shortest string)
n times
Anchors
^ or \A Match beginning of the string/line$ or \Z \z Match end of the string/lineEnd of string in any match mode\b Match word boundary\B
Match non-word boundary
Capture & Grouping
(...) Group several characters together for later use or
capture as a single unit
| Match either subexpressions (equivalent to “OR”)Example: match any database code in the list
$ex =~ m/will match:
(sp:|tr:|rs:)/g;
The ID sp:UBP5_RAT is similar to the rabit AC tr:Q12345
RT
\n
Back group reference. number nMatch that was the previously same as the matched captured in the same MOTIF.
$n
Substring of captured group n
Example: match several instances with back reference
$ex =~ m/will match:
(the).+\1/i;
The ID sp:UBP5_RAT is similar to the rabit AC tr:Q12345
Example: rename any tr:AC to trembl_AC= using a capture
$ex =~ s/will match:
tr:([[:alnum:]]{6})/trembl_AC=$1/gi;
The ID sp:UBPAC=Q12345
5_RAT is similar to the rabit AC trembl_Text-span modi ers\Q Quote following metacharacters until motif (allow the use of scalars in regexp)
\E or end of
\u Force next character to uppercase\lForce next character to lowecase\U Force all following characters to uppercase\L Force all following characters to lowercase\EEnd a span started with \Q, \U or \LExtended Regexp(?#...) Substring “...” is a comment
(?=...) Positive (e.g., allow overlapping matches in global mode)
lookahead. Match if exists next match
(?!...) Negative lookahead. Match if no next match(?<=...) Positive lookahead. Fixed length only.(?<!...) Negative lookahead. Fixed length only.(?imsx) Modify matching options
Transliteration: translate operator tr///
EXPR =~ tr/SEARCHLIST/REPLACELIST/cds
Transliteration is not - and does not use - a regular expression, but it is frequently associated with the regexp in PERL. Thus we decided to include it in this guide.
Example: reverse and complement a DNA sequence
$DNA = AAATATTTCATCGTACAT;$revcom = reverse $DNA;
$revcom =~ tr/ACGTacgt/TGCAtgca/;
The transliteration will produce the following:print($DNA); print($revcom); AAATATTTCATCGTACAT ATGTACGATGAAATATTT
Options
c complement REPLACELISTd delete non-replaced characters
s
single replace of duplicated characters
UniCode matches
Perl long 5.8 supports UniCode 3.2. However it would be too information see “Mastering Regular Expressions”.
to describe all the properties in details here. For more \p{PROP} Matches a UniCode property
\P{PROP} Matches anything but a UniCode property
This document was written and designed by Laurent Falquet and Vassilios Ioannidis from the Swiss EMBnet node and being distributed by P&PR Publications Committee of EMBnet.EMBnet bioinformatics - European Molecular centers situated primarily in Europe. Most countries have a support network of Biology bioinformatics Network support - is a national node which can provide training courses forms of help for users of bioinformatics software.
and other You can nd information about your national node from the EMBnet site:
/
A Quick Guide To PERL Regular Expressions
First edition © 2005
正在阅读:
氧化还原反应知识点整理05-15
2012年浙江省第九届大学生财会信息化竞赛(本科组) - 图文06-11
2011年考研政治复习必知的开门七件事07-01
个人所得税06-03
2022-2022年高中数学广东高二同步测试精品试卷【3】含答案考点及04-10
第6讲 级数理论12-18
星火贯通英语15篇文章贯通六级词汇10-10
谈房建监理现场质量管理04-25
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- 入门教程
- 正则
- 表达式
- 简明
- Perl
- 第3章 C语言常用运算符
- 5S标语,工厂标语,5S标语大全
- 三明六中八年级语文上册第一次月考试题及答案
- 低糖山楂果糕的研制
- 人民币收付业务考核制度
- 华东师大版七年级下册数学第7章 一次方程组第3节《三元一次方程
- 湖北省中小学教师职务任职资格评审表
- 1998_2009年我国房地产宏观调控政策效果综合评价
- G10-K行车电脑使用说明书v206
- 《聊斋志异》中花妖形象分析
- 姚易君大师2022年风水布局
- much.many,enough的的用法
- 黄金海岸伯克利海滩酒店(Berkeley on the Beach)
- 剑灵三月激情血浪鲨湾武神塔开放
- ANSYS 经典培训第二章
- 2022年提升工作效能经验交流材料范文四篇
- 2 蛋白质序列特征分析~生物信息学
- 第二节世界的人种课件
- 关于幸福最新经典语录集锦大全
- 什么是科学精神和人文精神