Quoting (The GNU Awk User’s Guide)

From Get docs
Gawk/docs/latest/Quoting

Previous: Comments, Up: Running gawk   [Contents][Index]



1.1.6 Shell Quoting Issues

DOS Quoting    Quoting in Windows Batch Files.

For short to medium-length awk programs, it is most convenient to enter the program on the awk command line. This is best done by enclosing the entire program in single quotes. This is true whether you are entering the program interactively at the shell prompt, or writing it as part of a larger shell script:

awk 'program text' input-file1 input-file2 …

Once you are working with the shell, it is helpful to have a basic knowledge of shell quoting rules. The following rules apply only to POSIX-compliant, Bourne-style shells (such as Bash, the GNU Bourne-Again Shell). If you use the C shell, you’re on your own.

Before diving into the rules, we introduce a concept that appears throughout this Web page, which is that of the null, or empty, string.

The null string is character data that has no value. In other words, it is empty. It is written in awk programs like this: "". In the shell, it can be written using single or double quotes: "" or . Although the null string has no characters in it, it does exist. For example, consider this command:

$ echo ""

Here, the echo utility receives a single argument, even though that argument has no characters in it. In the rest of this Web page, we use the terms null string and empty string interchangeably. Now, on to the quoting rules:

  • Quoted items can be concatenated with nonquoted items as well as with other quoted items. The shell turns everything into one argument for the command.
  • Preceding any single character with a backslash (‘\’) quotes that character. The shell removes the backslash and passes the quoted character on to the command.
  • Single quotes protect everything between the opening and closing quotes. The shell does no interpretation of the quoted text, passing it on verbatim to the command. It is impossible to embed a single quote inside single-quoted text. Refer back to Comments in awk Programs for an example of what happens if you try.
  • Double quotes protect most things between the opening and closing quotes. The shell does at least variable and command substitution on the quoted text. Different shells may do additional kinds of processing on double-quoted text.

    Because certain characters within double-quoted text are processed by the shell, they must be escaped within the text. Of note are the characters ‘$’, ‘`’, ‘\’, and ‘"’, all of which must be preceded by a backslash within double-quoted text if they are to be passed on literally to the program. (The leading backslash is stripped first.) Thus, the example seen previously in Running awk Without Input Files:

    awk 'BEGIN { print "Don\47t Panic!" }'

    could instead be written this way:

    $ awk "BEGIN { print \"Don't Panic!\" }"
    -| Don't Panic!

    Note that the single quote is not special within double quotes.

  • Null strings are removed when they occur as part of a non-null command-line argument, while explicit null objects are kept. For example, to specify that the field separator FS should be set to the null string, use:

    awk -F "" 'program' files # correct

    Don’t use this:

    awk -F"" 'program' files # wrong! In the second case, awk attempts to use the text of the program as the value of FS, and the first file name as the text of the program! This results in syntax errors at best, and confusing behavior at worst.

Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this:

$ awk 'BEGIN { print "Here is a single quote <'"'"'>" }'
-| Here is a single quote <'>

This program consists of three concatenated quoted strings. The first and the third are single-quoted, and the second is double-quoted.

This can be “simplified” to:

$ awk 'BEGIN { print "Here is a single quote <'\''>" }'
-| Here is a single quote <'>

Judge for yourself which of these two is the more readable.

Another option is to use double quotes, escaping the embedded, awk-level double quotes:

$ awk "BEGIN { print \"Here is a single quote <'>\" }"
-| Here is a single quote <'>

This option is also painful, because double quotes, backslashes, and dollar signs are very common in more advanced awk programs.

A third option is to use the octal escape sequence equivalents (see section Escape Sequences) for the single- and double-quote characters, like so:

$ awk 'BEGIN { print "Here is a single quote <\47>" }'
-| Here is a single quote <'>
$ awk 'BEGIN { print "Here is a double quote <\42>" }'
-| Here is a double quote <">

This works nicely, but you should comment clearly what the escape sequences mean.

A fourth option is to use command-line variable assignment, like this:

$ awk -v sq="'" 'BEGIN { print "Here is a single quote <" sq ">" }'
-| Here is a single quote <'>

(Here, the two string constants and the value of sq are concatenated into a single string that is printed by print.)

If you really need both single and double quotes in your awk program, it is probably best to move it into a separate file, where the shell won’t be part of the picture and you can say what you mean.



Previous: Comments, Up: Running gawk   [Contents][Index]