Here is a brief description
of regular expression syntax as used in
A single ordinary character matches itself.
Matches a sequence of zero or more instances of matches for the
preceding regular expression, which must be an ordinary character, a
special character preceded by
., a grouped regexp
(see below), or a bracket expression. As a GNU extension, a
postfixed regular expression can also be followed by
a** is equivalent to
1003.1-2001 says that
* stands for itself when it appears at
the start of a regular expression or subexpression, but many
nonGNU implementations do not support this and portable
scripts should instead use
\* in these contexts.
Matches any character, including newline.
Matches the null string at beginning of the pattern space, i.e. what appears after the circumflex must appear at the beginning of the pattern space.
In most scripts, pattern space is initialized to the content of each
line (see How
sed works). So, it is a
useful simplification to think of
^#include as matching only
lines where ‘
#include’ is the first thing on line—if there are
spaces before, for example, the match fails. This simplification is
valid as long as the original content of pattern space is not modified,
for example with an
^ acts as a special character only at the beginning of the
regular expression or subexpression (that is, after
\|). Portable scripts should avoid
^ at the beginning of
a subexpression, though, as POSIX allows implementations that
^ as an ordinary character in that context.
It is the same as
^, but refers to end of pattern space.
$ also acts as a special character only at the end
of the regular expression or subexpression (that is, before
\|), and its use at the end of a subexpression is not
Matches any single character in
list: for example,
[aeiou] matches all vowels. A list may include
matches any character between (inclusive)
See Character Classes and Bracket Expressions.
*, but matches one or more. It is a GNU extension.
*, but only matches zero or one. It is a GNU extension.
*, but matches exactly
i sequences (
i is a
decimal integer; for portability, keep it between 0 and 255
j, inclusive, sequences.
Matches more than or equal to
Groups the inner
regexp as a whole, this is used to:
\(abcd\)*: this will search for zero or more whole sequences of ‘
abcd*would search for ‘
abc’ followed by zero or more occurrences of ‘
d’. Note that support for
\(abcd\)*is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.
parentheses to use complex alternative regular expressions.
The matching process tries each alternative in turn, from
left to right, and the first one that succeeds is used.
It is a GNU extension.
Matches the concatenation of
Concatenation binds more tightly than
$, but less tightly than the other regular expression
subexpression in the regular expression. This is called a back
reference. Subexpressions are implicitly numbered by counting
Matches the newline character.
char is one of
Note that the only C-like
backslash sequences that you can portably assume to be
\\; in particular
\t is not portable, and matches a ‘
t’ under most
sed, rather than a tab character.
Note that the regular expression matcher is greedy, i.e., matches are attempted from left to right and, if two or more matches are possible starting at the same character, it selects the longest.
a’s followed by a single ‘
b’. For example, ‘
b’ or ‘
b’ or ‘
a’s followed by one or more ‘
ab’ is the shortest possible match, but other examples are ‘
aaaab’ or ‘
abbbbb’ or ‘
main’, followed by an opening and closing parenthesis. The ‘
(’ and ‘
)’ need not be adjacent.
A’ at the end of a line.