Regexp Addresses (sed, a stream editor)

From Get docs
Sed/docs/latest/Regexp-Addresses


4.3 selecting lines by text matching

GNU sed supports the following regular expression addresses. The default regular expression is Basic Regular Expression (BRE). If -E or -r options are used, The regular expression should be in Extended Regular Expression (ERE) syntax. See BRE vs ERE.

/regexp/

This will select any line which matches the regular expression regexp. If regexp itself includes any / characters, each must be escaped by a backslash (\).

The following command prints lines in /etc/passwd which end with ‘bash5:

sed -n '/bash$/p' /etc/passwd

The empty regular expression ‘//’ repeats the last regular expression match (the same holds if the empty regular expression is passed to the s command). Note that modifiers to regular expressions are evaluated when the regular expression is compiled, thus it is invalid to specify them together with the empty regular expression.

\%regexp%

(The % may be replaced by any other single character.)

This also matches the regular expression regexp, but allows one to use a different delimiter than /. This is particularly useful if the regexp itself contains a lot of slashes, since it avoids the tedious escaping of every /. If regexp itself includes any delimiter characters, each must be escaped by a backslash (\).

The following commands are equivalent. They print lines which start with ‘/home/alice/documents/’:

sed -n '/^\/home\/alice\/documents\//p'
sed -n '\%^/home/alice/documents/%p'
sed -n '\;^/home/alice/documents/;p'
/regexp/I
\%regexp%I

The I modifier to regular-expression matching is a GNU extension which causes the regexp to be matched in a case-insensitive manner.

In many other programming languages, a lower case i is used for case-insensitive regular expression matching. However, in sed the i is used for the insert command (see insert command).

Observe the difference between the following examples.

In this example, /b/I is the address: regular expression with I modifier. d is the delete command:

$ printf "%s\n" a b c | sed '/b/Id'
a
c

Here, /b/ is the address: a regular expression. i is the insert command. d is the value to insert. A line with ‘d’ is then inserted above the matched line:

$ printf "%s\n" a b c | sed '/b/id'
a
d
b
c
/regexp/M
\%regexp%M

The M modifier to regular-expression matching is a GNU sed extension which directs GNU sed to match the regular expression in multi-line mode. The modifier causes ^ and $ to match respectively (in addition to the normal behavior) the empty string after a newline, and the empty string before a newline. There are special character sequences (\` and \') which always match the beginning or the end of the buffer. In addition, the period character does not match a new-line character in multi-line mode.

Regex addresses operate on the content of the current pattern space. If the pattern space is changed (for example with s/// command) the regular expression matching will operate on the changed text.

In the following example, automatic printing is disabled with -n. The s/2/X/ command changes lines containing ‘2’ to ‘X’. The command /[0-9]/p matches lines with digits and prints them. Because the second line is changed before the /[0-9]/ regex, it will not match and will not be printed:

$ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
1
3



Footnotes

(5)

There are of course many other ways to do the same, e.g.

grep 'bash$' /etc/passwd
awk -F: '$7 == "/bin/bash"' /etc/passwd