Sed/Escapes
Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]
5.8 Escape Sequences - specifying special characters
Until this chapter, we have only encountered escapes of the form
‘\^
’, which tell sed
not to interpret the circumflex
as a special character, but rather to take it literally. For
example, ‘\*
’ matches a single asterisk rather than zero
or more backslashes.
This chapter introduces another kind of escape6—that
is, escapes that are applied to a character or sequence of characters
that ordinarily are taken literally, and that sed
replaces
with a special character. This provides a way
of encoding non-printable characters in patterns in a visible manner.
There is no restriction on the appearance of non-printing characters
in a sed
script but when a script is being prepared in the
shell or by text editing, it is usually easier to use one of
the following escape sequences than the binary character it
represents:
The list of these escapes is:
\a
- Produces or matches a BEL character, that is an “alert” (ASCII 7).
\f
- Produces or matches a form feed (ASCII 12).
\n
- Produces or matches a newline (ASCII 10).
\r
- Produces or matches a carriage return (ASCII 13).
\t
- Produces or matches a horizontal tab (ASCII 9).
\v
- Produces or matches a so called “vertical tab” (ASCII 11).
\cx
- Produces or matches CONTROL-
x
, wherex
is any character. The precise effect of ‘\cx
’ is as follows: ifx
is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz
’ becomes hex 1A, but ‘\c{
’ becomes hex 3B, while ‘\c;
’ becomes hex 7B. \dxxx
- Produces or matches a character whose decimal ASCII value is
xxx
. \oxxx
- Produces or matches a character whose octal ASCII value is
xxx
. \xxx
- Produces or matches a character whose hexadecimal ASCII value is
xx
.
‘\b
’ (backspace) was omitted because of the conflict with
the existing “word boundary” meaning.
5.8.1 Escaping Precedence
GNU sed
processes escape sequences before passing
the text onto the regular-expression matching of the s///
command
and Address matching. Thus the follwing two commands are equivalent
(‘0x5e
’ is the hexadecimal ASCII value of the character ‘^
’):
$ echo 'a^c' | sed 's/^/b/' ba^c $ echo 'a^c' | sed 's/\x5e/b/' ba^c
As are the following (‘0x5b
’,‘0x5d
’ are the hexadecimal
ASCII values of ‘[
’,‘]
’, respectively):
$ echo abc | sed 's/[a]/x/' Xbc $ echo abc | sed 's/\x5ba\x5d/x/' Xbc
However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent:
$ echo 'a^c' | sed 's/\^/b/' abc $ echo 'a^c' | sed 's/\\\x5e/b/' a^c
Footnotes
(6)
All
the escapes introduced here are GNU
extensions, with the exception of \n
. In basic regular
expression mode, setting POSIXLY_CORRECT
disables them inside
bracket expressions.
Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]