Extension Sample File Functions (The GNU Awk User’s Guide)

From Get docs
Gawk/docs/latest/Extension-Sample-File-Functions


17.7.1 File-Related Functions

The filefuncs extension provides three different functions, as follows. The usage is:

@load "filefuncs"

This is how you load the extension.

result = chdir("/some/directory")

The chdir() function is a direct hook to the chdir() system call to change the current directory. It returns zero upon success or a value less than zero upon error. In the latter case, it updates ERRNO.

result = stat("/some/path", statdata [, follow])

The stat() function provides a hook into the stat() system call. It returns zero upon success or a value less than zero upon error. In the latter case, it updates ERRNO.

By default, it uses the lstat() system call. However, if passed a third argument, it uses stat() instead.

In all cases, it clears the statdata array. When the call is successful, stat() fills the statdata array with information retrieved from the filesystem, as follows:

Subscript Field in struct stat File type
"name" The file name All
"dev" st_dev All
"ino" st_ino All
"mode" st_mode All
"nlink" st_nlink All
"uid" st_uid All
"gid" st_gid All
"size" st_size All
"atime" st_atime All
"mtime" st_mtime All
"ctime" st_ctime All
"rdev" st_rdev Device files
"major" st_major Device files
"minor" st_minor Device files
"blksize" st_blksize All
"pmode" A human-readable version of the mode value, like that printed by ls (for example, "-rwxr-xr-x") All
"linkval" The value of the symbolic link Symbolic links
"type" The type of the file as a string—one of "file", "blockdev", "chardev", "directory", "socket", "fifo", "symlink", "door", or "unknown" (not all systems support all file types) All

flags = or(FTS_PHYSICAL, ...)
result = fts(pathlist, flags, filedata)

Walk the file trees provided in pathlist and fill in the filedata array, as described next. flags is the bitwise OR of several predefined values, also described in a moment. Return zero if there were no errors, otherwise return -1.

The fts() function provides a hook to the C library fts() routines for traversing file hierarchies. Instead of returning data about one file at a time in a stream, it fills in a multidimensional array with data about each file and directory encountered in the requested hierarchies.

The arguments are as follows:

pathlist

An array of file names. The element values are used; the index values are ignored.

flags

This should be the bitwise OR of one or more of the following predefined constant flag values. At least one of FTS_LOGICAL or FTS_PHYSICAL must be provided; otherwise fts() returns an error value and sets ERRNO. The flags are:

FTS_LOGICAL

Do a “logical” file traversal, where the information returned for a symbolic link refers to the linked-to file, and not to the symbolic link itself. This flag is mutually exclusive with FTS_PHYSICAL.

FTS_PHYSICAL

Do a “physical” file traversal, where the information returned for a symbolic link refers to the symbolic link itself. This flag is mutually exclusive with FTS_LOGICAL.

FTS_NOCHDIR

As a performance optimization, the C library fts() routines change directory as they traverse a file hierarchy. This flag disables that optimization.

FTS_COMFOLLOW

Immediately follow a symbolic link named in pathlist, whether or not FTS_LOGICAL is set.

FTS_SEEDOT

By default, the C library fts() routines do not return entries for . (dot) and .. (dot-dot). This option causes entries for dot-dot to also be included. (The extension always includes an entry for dot; more on this in a moment.)

FTS_XDEV

During a traversal, do not cross onto a different mounted filesystem.

filedata

The filedata array holds the results. fts() first clears it. Then it creates an element in filedata for every element in pathlist. The index is the name of the directory or file given in pathlist. The element for this index is itself an array. There are two cases:

The path is a file

In this case, the array contains two or three elements:

"path"

The full path to this file, starting from the “root” that was given in the pathlist array.

"stat"

This element is itself an array, containing the same information as provided by the stat() function described earlier for its statdata argument. The element may not be present if the stat() system call for the file failed.

"error"

If some kind of error was encountered, the array will also contain an element named "error", which is a string describing the error.

The path is a directory

In this case, the array contains one element for each entry in the directory. If an entry is a file, that element is the same as for files, just described. If the entry is a directory, that element is (recursively) an array describing the subdirectory. If FTS_SEEDOT was provided in the flags, then there will also be an element named "..". This element will be an array containing the data as provided by stat().

In addition, there will be an element whose index is ".". This element is an array containing the same two or three elements as for a file: "path", "stat", and "error".

The fts() function returns zero if there were no errors. Otherwise, it returns -1.

NOTE: The fts() extension does not exactly mimic the interface of the C library fts() routines, choosing instead to provide an interface that is based on associative arrays, which is more comfortable to use from an awk program. This includes the lack of a comparison function, because gawk already provides powerful array sorting facilities. Although an fts_read()-like interface could have been provided, this felt less natural than simply creating a multidimensional array to represent the file hierarchy and its information.

See test/fts.awk in the gawk distribution for an example use of the fts() extension function.