Character Classes and Bracket Expressions (sed, a stream editor)
Next: regexp extensions, Previous: ERE syntax, Up: sed regular expressions [Contents][Index]
5.5 Character Classes and Bracket Expressions
A bracket expression is a list of characters enclosed by ‘[’ and ‘]’. It matches any single character in that list; if the first character of the list is the caret ‘^’, then it matches any character not in the list. For example, the following command replaces the words ‘gray’ or ‘grey’ with ‘blue’:
sed 's/gr[ae]y/blue/'
Bracket expressions can be used in both basic and extended regular expressions (that is, with or without the -E/-r options).
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive. In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’.
Finally, certain named classes of characters are predefined within bracket expressions, as follows.
These named classes must be used inside brackets themselves. Correct usage:
$ echo 1 | sed 's/[[../:digit:]]/X/' X
Incorrect usage is rejected by newer sed versions. Older versions accepted it but treated it as a single bracket expression (which is equivalent to ‘[dgit:]’, that is, only the characters d/g/i/t/:):
# current GNU sed versions - incorrect usage rejected $ echo 1 | sed 's/[:digit:]/X/' sed: character class syntax is [[../:space:]], not [:space:] # older GNU sed versions $ echo 1 | sed 's/[:digit:]/X/' 1
- ‘
[:alnum:]’ Alphanumeric characters: ‘
[:alpha:]’ and ‘[:digit:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.- ‘
[:alpha:]’ Alphabetic characters: ‘
[:lower:]’ and ‘[:upper:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’.- ‘
[:blank:]’ Blank characters: space and tab.
- ‘
[:cntrl:]’ Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any.
- ‘
[:digit:]’ Digits:
0 1 2 3 4 5 6 7 8 9.- ‘
[:graph:]’ Graphical characters: ‘
[:alnum:]’ and ‘[:punct:]’.- ‘
[:lower:]’ Lower-case letters; in the ‘
C’ locale and ASCII character encoding, this isa b c d e f g h i j k l m n o p q r s t u v w x y z.- ‘
[:print:]’ Printable characters: ‘
[:alnum:]’, ‘[:punct:]’, and space.- ‘
[:punct:]’ Punctuation characters; in the ‘
C’ locale and ASCII character encoding, this is! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.- ‘
[:space:]’ Space characters: in the ‘
C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.- ‘
[:upper:]’ Upper-case letters: in the ‘
C’ locale and ASCII character encoding, this isA B C D E F G H I J K L M N O P Q R S T U V W X Y Z.- ‘
[:xdigit:]’ Hexadecimal digits:
0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f.
Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.
Most meta-characters lose their special meaning inside bracket expressions:
- ‘
]’ - ends the bracket expression if it’s not the first list item. So, if you want to make the ‘
]’ character a list item, you must put it first. - ‘
-’ - represents the range if it’s not first or last in a list or the ending point of a range.
- ‘
^’ - represents the characters not in the list. If you want to make the ‘
^’ character a list item, place it anywhere but first.
TODO: incorporate this paragraph (copied verbatim from BRE section).
The characters $, *, ., [, and \ are normally not special within list. For example, [\*] matches either ‘\’ or ‘*’, because the \ is not special here. However, strings like [.ch.], [=a=], and [:space:] are special within list and represent collating symbols, equivalence classes, and character classes, respectively, and [ is therefore special within list when it is followed by ., =, or :. Also, when not in POSIXLY_CORRECT mode, special escapes like \n and \t are recognized within list. See Escapes.
- ‘
[.’ - represents the open collating symbol.
- ‘
.]’ - represents the close collating symbol.
- ‘
[=’ - represents the open equivalence class.
- ‘
=]’ - represents the close equivalence class.
- ‘
[:’ - represents the open character class symbol, and should be followed by a valid character class name.
- ‘
:]’ - represents the close character class symbol.
Next: regexp extensions, Previous: ERE syntax, Up: sed regular expressions [Contents][Index]