Back-references and Subexpressions (sed, a stream editor)

From Get docs
Sed/docs/latest/Back 002dreferences-and-Subexpressions


5.7 Back-references and Subexpressions

back-references are regular expression commands which refer to a previous part of the matched regular expression. Back-references are specified with backslash and a single digit (e.g. ‘\1’). The part of the regular expression they refer to is called a subexpression, and is designated with parentheses.

Back-references and subexpressions are used in two cases: in the regular expression search pattern, and in the replacement part of the s command (see Regular Expression Addresses and The "s" Command).

In a regular expression pattern, back-references are used to match the same content as a previously matched subexpression. In the following example, the subexpression is ‘.’ - any single character (being surrounded by parentheses makes it a subexpression). The back-reference ‘\1’ asks to match the same content (same character) as the sub-expression.

The command below matches words starting with any character, followed by the letter ‘o’, followed by the same character as the first.

$ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
bob
mom
non
pop
sos
tot
wow

Multiple subexpressions are automatically numbered from left-to-right. This command searches for 6-letter palindromes (the first three letters are 3 subexpressions, followed by 3 back-references in reverse order):

$ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
redder

In the s command, back-references can be used in the replacement part to refer back to subexpressions in the regexp part.

The following example uses two subexpressions in the regular expression to match two space-separated words. The back-references in the replacement part prints the words in a different order:

$ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
The name is Bond, James Bond.

When used with alternation, if the group does not participate in the match then the back-reference makes the whole match fail. For example, ‘a(.)|b\1’ will not match ‘ba’. When multiple regular expressions are given with -e or from a file (‘-f file’), back-references are local to each expression.