Gawk/Leftmost-Longest
Next: Computed Regexps, Previous: Bracket Expressions, Up: Regexp [Contents][Index]
3.5 How Much Text Matches?
Consider the following:
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
This example uses the sub()
function to make a change to the input
record. (sub()
replaces the first instance of any text matched
by the first argument with the string provided as the second argument;
see section String-Manipulation Functions.) Here, the regexp /a+/
indicates “one
or more ‘a
’ characters,” and the replacement text is ‘<A>
’.
The input contains four ‘a
’ characters.
awk
(and POSIX) regular expressions always match
the leftmost, longest sequence of input characters that can
match. Thus, all four ‘a
’ characters are
replaced with ‘<A>
’ in this example:
$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }' -| <A>bcd
For simple match/no-match tests, this is not so important. But when doing
text matching and substitutions with the match()
, sub()
, gsub()
,
and gensub()
functions, it is very important.
Understanding this principle is also important for regexp-based record
and field splitting (see section How Input Is Split into Records,
and also see section Specifying How Fields Are Separated).