Basic vs Extended (GNU Grep 3.7)
Next: Character Encoding, Previous: Back-references and Subexpressions, Up: Regular Expressions [Contents][Index]
3.6 Basic vs Extended Regular Expressions
In basic regular expressions the characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ lose their special meaning; instead use the backslashed versions ‘\?’, ‘\+’, ‘\{’, ‘\|’, ‘\(’, and ‘\)’. Also, a backslash is needed before an interval expression’s closing ‘}’, and an unmatched \) is invalid.
Portable scripts should avoid the following constructs, as POSIX says they produce undefined results:
- Extended regular expressions that use back-references.
- Basic regular expressions that use ‘
\?’, ‘\+’, or ‘\|’. - Empty parenthesized regular expressions like ‘
()’. - Empty alternatives (as in, e.g, ‘
a|’). - Repetition operators that immediately follow empty expressions, unescaped ‘
$’, or other repetition operators. - A backslash escaping an ordinary character (e.g., ‘
\S’), unless it is a back-reference. - An unescaped ‘
[’ that is not part of a bracket expression. - In extended regular expressions, an unescaped ‘
{’ that is not part of an interval expression.
Traditional egrep did not support interval expressions and some egrep implementations use ‘\{’ and ‘\}’ instead, so portable scripts should avoid interval expressions in ‘grep -E’ patterns and should use ‘[{]’ to match a literal ‘{’.
GNU grep -E attempts to support traditional usage by assuming that ‘{’ is not special if it would be the start of an invalid interval expression. For example, the command ‘grep -E '{1'’ searches for the two-character string ‘{1’ instead of reporting a syntax error in the regular expression. POSIX allows this behavior as an extension, but portable scripts should avoid it.