Sed/Character-Classes-and-Bracket-Expressions
Next: regexp extensions, Previous: ERE syntax, Up: sed regular expressions [Contents][Index]
5.5 Character Classes and Bracket Expressions
A bracket expression is a list of characters enclosed by ‘[
’ and
‘]
’.
It matches any single character in that list;
if the first character of the list is the caret ‘^
’,
then it matches any character not in the list.
For example, the following command replaces the words
‘gray
’ or ‘grey
’ with ‘blue
’:
sed 's/gr[ae]y/blue/'
Bracket expressions can be used in both
basic and extended
regular expressions (that is, with or without the -E
/-r
options).
Within a bracket expression, a range expression consists of two
characters separated by a hyphen.
It matches any single character that
sorts between the two characters, inclusive.
In the default C locale, the sorting sequence is the native character
order; for example, ‘[a-d]
’ is equivalent to ‘[abcd]
’.
Finally, certain named classes of characters are predefined within bracket expressions, as follows.
These named classes must be used inside brackets themselves. Correct usage:
$ echo 1 | sed 's/[[:digit:]]/X/' X
Incorrect usage is rejected by newer sed
versions.
Older versions accepted it but treated it as a single bracket expression
(which is equivalent to ‘[dgit:]
’,
that is, only the characters d/g/i/t/:
):
# current GNU sed versions - incorrect usage rejected $ echo 1 | sed 's/[:digit:]/X/' sed: character class syntax is [[:space:]], not [:space:] # older GNU sed versions $ echo 1 | sed 's/[:digit:]/X/' 1
- ‘
[:alnum:]
’ Alphanumeric characters: ‘
[:alpha:]
’ and ‘[:digit:]
’; in the ‘C
’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]
’.- ‘
[:alpha:]
’ Alphabetic characters: ‘
[:lower:]
’ and ‘[:upper:]
’; in the ‘C
’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]
’.- ‘
[:blank:]
’ Blank characters: space and tab.
- ‘
[:cntrl:]
’ Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any.
- ‘
[:digit:]
’ Digits:
0 1 2 3 4 5 6 7 8 9
.- ‘
[:graph:]
’ Graphical characters: ‘
[:alnum:]
’ and ‘[:punct:]
’.- ‘
[:lower:]
’ Lower-case letters; in the ‘
C
’ locale and ASCII character encoding, this isa b c d e f g h i j k l m n o p q r s t u v w x y z
.- ‘
[:print:]
’ Printable characters: ‘
[:alnum:]
’, ‘[:punct:]
’, and space.- ‘
[:punct:]
’ Punctuation characters; in the ‘
C
’ locale and ASCII character encoding, this is! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
.- ‘
[:space:]
’ Space characters: in the ‘
C
’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.- ‘
[:upper:]
’ Upper-case letters: in the ‘
C
’ locale and ASCII character encoding, this isA B C D E F G H I J K L M N O P Q R S T U V W X Y Z
.- ‘
[:xdigit:]
’ Hexadecimal digits:
0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
.
Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.
Most meta-characters lose their special meaning inside bracket expressions:
- ‘
]
’ - ends the bracket expression if it’s not the first list item. So, if you want to make the ‘
]
’ character a list item, you must put it first. - ‘
-
’ - represents the range if it’s not first or last in a list or the ending point of a range.
- ‘
^
’ - represents the characters not in the list. If you want to make the ‘
^
’ character a list item, place it anywhere but first.
TODO: incorporate this paragraph (copied verbatim from BRE section).
The characters $
, *
, .
, [
, and \
are normally not special within list
. For example, [\*]
matches either ‘\
’ or ‘*
’, because the \
is not
special here. However, strings like [.ch.]
, [=a=]
, and
[:space:]
are special within list
and represent collating
symbols, equivalence classes, and character classes, respectively, and
[
is therefore special within list
when it is followed by
.
, =
, or :
. Also, when not in
POSIXLY_CORRECT
mode, special escapes like \n
and
\t
are recognized within list
. See Escapes.
- ‘
[.
’ - represents the open collating symbol.
- ‘
.]
’ - represents the close collating symbol.
- ‘
[=
’ - represents the open equivalence class.
- ‘
=]
’ - represents the close equivalence class.
- ‘
[:
’ - represents the open character class symbol, and should be followed by a valid character class name.
- ‘
:]
’ - represents the close character class symbol.
Next: regexp extensions, Previous: ERE syntax, Up: sed regular expressions [Contents][Index]