plan9fox/sys/man/6/regexp
2017-12-11 19:58:06 -05:00

129 lines
2 KiB
Text

.TH REGEXP 6
.SH NAME
regexp \- regular expression notation
.SH DESCRIPTION
A
.I "regular expression"
specifies
a set of strings of characters.
A member of this set of strings is said to be
.I matched
by the regular expression. In many applications
a delimiter character, commonly
.LR / ,
bounds a regular expression.
In the following specification for regular expressions
the word `character' means any character (rune) but newline.
.PP
The syntax for a regular expression
.B e0
is
.IP
.EX
e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')'
e2: e3
| e2 REP
REP: '*' | '+' | '?'
e1: e2
| e1 e2
e0: e1
| e0 '|' e1
.EE
.PP
A
.B literal
is any non-metacharacter, or a metacharacter
(one of
.BR .*+?[]()|\e^$ ),
or the delimiter
preceded by
.LR \e .
.PP
A
.B charclass
is a nonempty string
.I s
bracketed
.BI [ \|s\| ]
(or
.BI [^ s\| ]\fR);
it matches any character in (or not in)
.IR s .
A negated character class never
matches newline.
A substring
.IB a - b\f1,
with
.I a
and
.I b
in ascending
order, stands for the inclusive
range of
characters between
.I a
and
.IR b .
In
.IR s ,
the metacharacters
.LR - ,
.LR ] ,
an initial
.LR ^ ,
and the regular expression delimiter
must be preceded by a
.LR \e ;
other metacharacters
have no special meaning and
may appear unescaped.
.PP
A
.L .
matches any character.
.PP
A
.L ^
matches the beginning of a line;
.L $
matches the end of the line.
.PP
The
.B REP
operators match zero or more
.RB ( * ),
one or more
.RB ( + ),
zero or one
.RB ( ? ),
instances respectively of the preceding regular expression
.BR e2 .
.PP
A concatenated regular expression,
.BR "e1\|e2" ,
matches a match to
.B e1
followed by a match to
.BR e2 .
.PP
An alternative regular expression,
.BR "e0\||\|e1" ,
matches either a match to
.B e0
or a match to
.BR e1 .
.PP
A match to any part of a regular expression
extends as far as possible without preventing
a match to the remainder of the regular expression.
.SH "SEE ALSO"
.IR awk (1),
.IR ed (1),
.IR grep (1),
.IR sam (1),
.IR sed (1),
.IR regexp (2)