Leftmost Longest (The GNU Awk User’s Guide)
Next: Computed Regexps, Previous: Bracket Expressions, Up: Regexp [Contents][Index]
3.5 How Much Text Matches?
Consider the following:
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
This example uses the sub()
function to make a change to the input record. (sub()
replaces the first instance of any text matched by the first argument with the string provided as the second argument; see section String-Manipulation Functions.) Here, the regexp /a+/
indicates “one or more ‘a
’ characters,” and the replacement text is ‘<A>
’.
The input contains four ‘a
’ characters. awk
(and POSIX) regular expressions always match the leftmost, longest sequence of input characters that can match. Thus, all four ‘a
’ characters are replaced with ‘<A>
’ in this example:
$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }' -| <A>bcd
For simple match/no-match tests, this is not so important. But when doing text matching and substitutions with the match()
, sub()
, gsub()
, and gensub()
functions, it is very important. Understanding this principle is also important for regexp-based record and field splitting (see section How Input Is Split into Records, and also see section Specifying How Fields Are Separated).