Basic vs Extended (GNU Grep 3.7)
Next: Character Encoding, Previous: Back-references and Subexpressions, Up: Regular Expressions [Contents][Index]
3.6 Basic vs Extended Regular Expressions
In basic regular expressions the characters ‘?
’, ‘+
’, ‘{
’, ‘|
’, ‘(
’, and ‘)
’ lose their special meaning; instead use the backslashed versions ‘\?
’, ‘\+
’, ‘\{
’, ‘\|
’, ‘\(
’, and ‘\)
’. Also, a backslash is needed before an interval expression’s closing ‘}
’, and an unmatched \)
is invalid.
Portable scripts should avoid the following constructs, as POSIX says they produce undefined results:
- Extended regular expressions that use back-references.
- Basic regular expressions that use ‘
\?
’, ‘\+
’, or ‘\|
’. - Empty parenthesized regular expressions like ‘
()
’. - Empty alternatives (as in, e.g, ‘
a|
’). - Repetition operators that immediately follow empty expressions, unescaped ‘
$
’, or other repetition operators. - A backslash escaping an ordinary character (e.g., ‘
\S
’), unless it is a back-reference. - An unescaped ‘
[
’ that is not part of a bracket expression. - In extended regular expressions, an unescaped ‘
{
’ that is not part of an interval expression.
Traditional egrep
did not support interval expressions and some egrep
implementations use ‘\{
’ and ‘\}
’ instead, so portable scripts should avoid interval expressions in ‘grep -E
’ patterns and should use ‘[{]
’ to match a literal ‘{
’.
GNU grep -E
attempts to support traditional usage by assuming that ‘{
’ is not special if it would be the start of an invalid interval expression. For example, the command ‘grep -E '{1'
’ searches for the two-character string ‘{1
’ instead of reporting a syntax error in the regular expression. POSIX allows this behavior as an extension, but portable scripts should avoid it.