Gawk/Close-Files-And-Pipes

From Get docs

Next: Nonfatal, Previous: Special Files, Up: Printing   [Contents][Index]


5.9 Closing Input and Output Redirections

If the same file name or the same shell command is used with getline more than once during the execution of an awk program (see section Explicit Input with getline), the file is opened (or the command is executed) the first time only. At that time, the first record of input is read from that file or command. The next time the same file or command is used with getline, another record is read from it, and so on.

Similarly, when a file or pipe is opened for output, awk remembers the file name or command associated with it, and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until awk exits.

This implies that special steps are necessary in order to read the same file again from the beginning, or to rerun a shell command (rather than reading more output from the same command). The close() function makes these things possible:

close(filename)

or:

close(command)

The argument filename or command can be any expression. Its value must exactly match the string that was used to open the file or start the command (spaces and other “irrelevant” characters included). For example, if you open a pipe with this:

"sort -r names" | getline foo

then you must close it with this:

close("sort -r names")

Once this function call is executed, the next getline from that file or command, or the next print or printf to that file or command, reopens the file or reruns the command. Because the expression that you use to close a file or pipeline must exactly match the expression used to open the file or run the command, it is good practice to use a variable to store the file name or command. The previous example becomes the following:

sortcom = "sort -r names"
sortcom | getline foo
…
close(sortcom)

This helps avoid hard-to-find typographical errors in your awk programs. Here are some of the reasons for closing an output file:

  • To write a file and read it back later on in the same awk program. Close the file after writing it, then begin reading it with getline.
  • To write numerous files, successively, in the same awk program. If the files aren’t closed, eventually awk may exceed a system limit on the number of open files in one process. It is best to close each one when the program has finished writing it.
  • To make a command finish. When output is redirected through a pipe, the command reading the pipe normally continues to try to read input as long as the pipe is open. Often this means the command cannot really do its work until the pipe is closed. For example, if output is redirected to the mail program, the message is not actually sent until the pipe is closed.
  • To run the same program a second time, with the same arguments. This is not the same thing as giving more input to the first run!

    For example, suppose a program pipes output to the mail program. If it outputs several lines redirected to this pipe without closing it, they make a single message of several lines. By contrast, if the program closes the pipe after each line of output, then each line makes a separate message.

If you use more files than the system allows you to have open, gawk attempts to multiplex the available open files among your data files. gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them. In fact, if you are using a lot of pipes, it is essential that you close commands when done. For example, consider something like this:

{
    …
    command = ("grep " $1 " /some/file | my_prog -q " $3)
    while ((command | getline) > 0) {
        process output of command
    }
    # need close(command) here
}

This example creates a new pipeline based on data in each record. Without the call to close() indicated in the comment, awk creates child processes to run the commands, until it eventually runs out of file descriptors for more pipelines.

Even though each command has finished (as indicated by the end-of-file return status from getline), the child process is not terminated;28 more importantly, the file descriptor for the pipe is not closed and released until close() is called or awk exits.

close() silently does nothing if given an argument that does not represent a file, pipe, or coprocess that was opened with a redirection. In such a case, it returns a negative value, indicating an error. In addition, gawk sets ERRNO to a string indicating the error.

Note also that ‘close(FILENAME)’ has no “magic” effects on the implicit loop that reads through the files named on the command line. It is, more likely, a close of a file that was never opened with a redirection, so awk silently does nothing, except return a negative value.

When using the ‘|&’ operator to communicate with a coprocess, it is occasionally useful to be able to close one end of the two-way pipe without closing the other. This is done by supplying a second argument to close(). As in any other call to close(), the first argument is the name of the command or special file used to start the coprocess. The second argument should be a string, with either of the values "to" or "from". Case does not matter. As this is an advanced feature, discussion is delayed until Two-Way Communications with Another Process, which describes it in more detail and gives an example.

Using close()’s Return Value

In many older versions of Unix awk, the close() function is actually a statement. (d.c.) It is a syntax error to try and use the return value from close():

command = "…"
command | getline info
retval = close(command)  # syntax error in many Unix awks

gawk treats close() as a function. The return value is -1 if the argument names something that was never opened with a redirection, or if there is a system problem closing the file or process. In these cases, gawk sets the predefined variable ERRNO to a string describing the problem.

In gawk, starting with version 4.2, when closing a pipe or coprocess (input or output), the return value is the exit status of the command, as described in Table 5.1.29 Otherwise, it is the return value from the system’s close() or fclose() C functions when closing input or output files, respectively. This value is zero if the close succeeds, or -1 if it fails.

Situation Return value from close()
Normal exit of command Command’s exit status
Death by signal of command 256 + number of murderous signal
Death by signal of command with core dump 512 + number of murderous signal
Some kind of error -1

Table 5.1: Return values from close() of a pipe


The POSIX standard is very vague; it says that close() returns zero on success and a nonzero value otherwise. In general, different implementations vary in what they report when closing pipes; thus, the return value cannot be used portably. (d.c.) In POSIX mode (see section Command-Line Options), gawk just returns zero when closing a pipe.

Footnotes

(28)

The technical terminology is rather morbid. The finished child is called a “zombie,” and cleaning up after it is referred to as “reaping.”

(29)

Prior to version 4.2, the return value from closing a pipe or co-process was the full 16-bit exit value as defined by the wait() system call.


Next: Nonfatal, Previous: Special Files, Up: Printing   [Contents][Index]