Basic vs Extended (GNU Grep 3.7)

From Get docs
Grep/docs/latest/Basic-vs-Extended /
Revision as of 03:31, 6 December 2021 by Notes (talk | contribs) (Page commit)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

3.6 Basic vs Extended Regular Expressions

In basic regular expressions the characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ lose their special meaning; instead use the backslashed versions ‘\?’, ‘\+’, ‘\{’, ‘\|’, ‘\(’, and ‘\)’. Also, a backslash is needed before an interval expression’s closing ‘}’, and an unmatched \) is invalid.

Portable scripts should avoid the following constructs, as POSIX says they produce undefined results:

  • Extended regular expressions that use back-references.
  • Basic regular expressions that use ‘\?’, ‘\+’, or ‘\|’.
  • Empty parenthesized regular expressions like ‘()’.
  • Empty alternatives (as in, e.g, ‘a|’).
  • Repetition operators that immediately follow empty expressions, unescaped ‘$’, or other repetition operators.
  • A backslash escaping an ordinary character (e.g., ‘\S’), unless it is a back-reference.
  • An unescaped ‘[’ that is not part of a bracket expression.
  • In extended regular expressions, an unescaped ‘{’ that is not part of an interval expression.

Traditional egrep did not support interval expressions and some egrep implementations use ‘\{’ and ‘\}’ instead, so portable scripts should avoid interval expressions in ‘grep -E’ patterns and should use ‘[{]’ to match a literal ‘{’.

GNU grep -E attempts to support traditional usage by assuming that ‘{’ is not special if it would be the start of an invalid interval expression. For example, the command ‘grep -E '{1'’ searches for the two-character string ‘{1’ instead of reporting a syntax error in the regular expression. POSIX allows this behavior as an extension, but portable scripts should avoid it.