Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]
Until this chapter, we have only encountered escapes of the form
‘\^
’, which tell sed
not to interpret the circumflex
as a special character, but rather to take it literally. For
example, ‘\*
’ matches a single asterisk rather than zero
or more backslashes.
This chapter introduces another kind of escape6—that
is, escapes that are applied to a character or sequence of characters
that ordinarily are taken literally, and that sed
replaces
with a special character. This provides a way
of encoding non-printable characters in patterns in a visible manner.
There is no restriction on the appearance of non-printing characters
in a sed
script but when a script is being prepared in the
shell or by text editing, it is usually easier to use one of
the following escape sequences than the binary character it
represents:
The list of these escapes is:
\a
\f
\n
\r
\t
\v
\cx
x
, where x
is any character. The precise effect of ‘\cx
’ is as follows: if x
is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz
’ becomes hex 1A, but ‘\c{
’ becomes hex 3B, while ‘\c;
’ becomes hex 7B.\dxxx
xxx
.\oxxx
xxx
.\xxx
xx
.‘\b
’ (backspace) was omitted because of the conflict with
the existing “word boundary” meaning.
GNU sed
processes escape sequences before passing
the text onto the regular-expression matching of the s///
command
and Address matching. Thus the follwing two commands are equivalent
(‘0x5e
’ is the hexadecimal ASCII value of the character ‘^
’):
$ echo 'a^c' | sed 's/^/b/' ba^c $ echo 'a^c' | sed 's/\x5e/b/' ba^c
As are the following (‘0x5b
’,‘0x5d
’ are the hexadecimal
ASCII values of ‘[
’,‘]
’, respectively):
$ echo abc | sed 's/[a]/x/' Xbc $ echo abc | sed 's/\x5ba\x5d/x/' Xbc
However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent:
$ echo 'a^c' | sed 's/\^/b/' abc $ echo 'a^c' | sed 's/\\\x5e/b/' a^c
All
the escapes introduced here are GNU
extensions, with the exception of \n
. In basic regular
expression mode, setting POSIXLY_CORRECT
disables them inside
bracket expressions.
Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]