Feature History (The GNU Awk User’s Guide)

From Get docs
Gawk/docs/latest/Feature-History


A.6 History of gawk Features

This section describes the features in gawk over and above those in POSIX awk, in the order they were added to gawk.

Version 2.10 of gawk introduced the following features:

Version 2.13 of gawk introduced the following features:

  • The FIELDWIDTHS variable and its effects (see section Reading Fixed-Width Data).
  • The systime() and strftime() built-in functions for obtaining and printing timestamps (see section Time Functions).
  • Additional command-line options (see section Command-Line Options): - The -W lint option to provide error and portability checking for both the source code and at runtime. - The -W compat option to turn off the GNU extensions. - The -W posix option for full POSIX compliance.

Version 2.14 of gawk introduced the following feature:

Version 2.15 of gawk introduced the following features:

  • New variables (see section Predefined Variables): - ARGIND, which tracks the movement of FILENAME through ARGV. - ERRNO, which contains the system error message when getline returns -1 or close() fails.
  • The /dev/pid, /dev/ppid, /dev/pgrpid, and /dev/user special file names. These have since been removed.
  • The ability to delete all of an array at once with ‘delete array’ (see section The delete Statement).
  • Command-line option changes (see section Command-Line Options): - The ability to use GNU-style long-named options that start with --. - The --source option for mixing command-line and library-file source code.

Version 3.0 of gawk introduced the following features:

  • New or changed variables: - IGNORECASE changed, now applying to string comparison as well as regexp operations (see section Case Sensitivity in Matching). - RT, which contains the input text that matched RS (see section How Input Is Split into Records).
  • Full support for both POSIX and GNU regexps (see section Regular Expressions).
  • The gensub() function for more powerful text manipulation (see section String-Manipulation Functions).
  • The strftime() function acquired a default time format, allowing it to be called with no arguments (see section Time Functions).
  • The ability for FS and for the third argument to split() to be null strings (see section Making Each Character a Separate Field).
  • The ability for RS to be a regexp (see section How Input Is Split into Records).
  • The next file statement became nextfile (see section The nextfile Statement).
  • The fflush() function from BWK awk (then at Bell Laboratories; see section Input/Output Functions).
  • New command-line options: - The --lint-old option to warn about constructs that are not available in the original Version 7 Unix version of awk (see section Major Changes Between V7 and SVR3.1). - The -m option from BWK awk. (Brian was still at Bell Laboratories at the time.) This was later removed from both his awk and from gawk. - The --re-interval option to provide interval expressions in regexps (see section Regular Expression Operators). - The --traditional option was added as a better name for --compat (see section Command-Line Options).
  • The use of GNU Autoconf to control the configuration process (see section Compiling gawk for Unix-Like Systems).
  • Amiga support. This has since been removed.

Version 3.1 of gawk introduced the following features:

  • New variables (see section Predefined Variables): - BINMODE, for non-POSIX systems, which allows binary I/O for input and/or output files (see section Using gawk on PC Operating Systems). - LINT, which dynamically controls lint warnings. - PROCINFO, an array for providing process-related information. - TEXTDOMAIN, for setting an application’s internationalization text domain (see section Internationalization with gawk).
  • The ability to use octal and hexadecimal constants in awk program source code (see section Octal and Hexadecimal Numbers).
  • The ‘|&’ operator for two-way I/O to a coprocess (see section Two-Way Communications with Another Process).
  • The /inet special files for TCP/IP networking using ‘|&’ (see section Using gawk for Network Programming).
  • The optional second argument to close() that allows closing one end of a two-way pipe to a coprocess (see section Two-Way Communications with Another Process).
  • The optional third argument to the match() function for capturing text-matching subexpressions within a regexp (see section String-Manipulation Functions).
  • Positional specifiers in printf formats for making translations easier (see section Rearranging printf Arguments).
  • A number of new built-in functions: - The asort() and asorti() functions for sorting arrays (see section Controlling Array Traversal and Array Sorting). - The bindtextdomain(), dcgettext() and dcngettext() functions for internationalization (see section Internationalizing awk Programs). - The extension() function and the ability to add new built-in functions dynamically (see section Writing Extensions for gawk). - The mktime() function for creating timestamps (see section Time Functions). - The and(), or(), xor(), compl(), lshift(), rshift(), and strtonum() functions (see section Bit-Manipulation Functions).
  • The support for ‘next file’ as two words was removed completely (see section The nextfile Statement).
  • Additional command-line options (see section Command-Line Options): - The --dump-variables option to print a list of all global variables. - The --exec option, for use in CGI scripts. - The --gen-po command-line option and the use of a leading underscore to mark strings that should be translated (see section Extracting Marked Strings). - The --non-decimal-data option to allow non-decimal input data (see section Allowing Nondecimal Input Data). - The --profile option and pgawk, the profiling version of gawk, for producing execution profiles of awk programs (see section Profiling Your awk Programs). - The --use-lc-numeric option to force gawk to use the locale’s decimal point for parsing input data (see section Conversion of Strings and Numbers).
  • The use of GNU Automake to help in standardizing the configuration process (see section Compiling gawk for Unix-Like Systems).
  • The use of GNU gettext for gawk’s own message output (see section gawk Can Speak Your Language).
  • BeOS support. This was later removed.
  • Tandem support. This was later removed.
  • The Atari port became officially unsupported and was later removed entirely.
  • The source code changed to use ISO C standard-style function definitions.
  • POSIX compliance for sub() and gsub() (see section More about ‘\’ and ‘&’ with sub(), gsub(), and gensub()).
  • The length() function was extended to accept an array argument and return the number of elements in the array (see section String-Manipulation Functions).
  • The strftime() function acquired a third argument to enable printing times as UTC (see section Time Functions).

Version 4.0 of gawk introduced the following features:

  • Variable additions: - FPAT, which allows you to specify a regexp that matches the fields, instead of matching the field separator (see section Defining Fields by Content). - If PROCINFO["sorted_in"] exists, ‘for(iggy in foo)’ loops sort the indices before looping over them. The value of this element provides control over how the indices are sorted before the loop traversal starts (see section Using Predefined Array Scanning Orders with gawk). - PROCINFO["strftime"], which holds the default format for strftime() (see section Time Functions).
  • The special files /dev/pid, /dev/ppid, /dev/pgrpid and /dev/user were removed.
  • Support for IPv6 was added via the /inet6 special file. /inet4 forces IPv4 and /inet chooses the system default, which is probably IPv4 (see section Using gawk for Network Programming).
  • The use of ‘\s’ and ‘\S’ escape sequences in regular expressions (see section gawk-Specific Regexp Operators).
  • Interval expressions became part of default regular expressions (see section Regular Expression Operators).
  • POSIX character classes work even with --traditional (see section Regular Expression Operators).
  • break and continue became invalid outside a loop, even with --traditional (see section The break Statement, and also see The continue Statement).
  • fflush(), nextfile, and ‘delete array’ are allowed if --posix or --traditional, since they are all now part of POSIX.
  • An optional third argument to asort() and asorti(), specifying how to sort (see section String-Manipulation Functions).
  • The behavior of fflush() changed to match BWK awk and for POSIX; now both ‘fflush()’ and ‘fflush("")’ flush all open output redirections (see section Input/Output Functions).
  • The isarray() function which distinguishes if an item is an array or not, to make it possible to traverse arrays of arrays (see section Getting Type Information).
  • The patsplit() function which gives the same capability as FPAT, for splitting (see section String-Manipulation Functions).
  • An optional fourth argument to the split() function, which is an array to hold the values of the separators (see section String-Manipulation Functions).
  • Arrays of arrays (see section Arrays of Arrays).
  • The BEGINFILE and ENDFILE special patterns (see section The BEGINFILE and ENDFILE Special Patterns).
  • Indirect function calls (see section Indirect Function Calls).
  • switch / case are enabled by default (see section The switch Statement).
  • Command-line option changes (see section Command-Line Options): - The -b and --characters-as-bytes options which prevent gawk from treating input as a multibyte string. - The redundant --compat, --copyleft, and --usage long options were removed. - The --gen-po option was finally renamed to the correct --gen-pot. - The --sandbox option which disables certain features. - All long options acquired corresponding short options, for use in ‘#!’ scripts.
  • Directories named on the command line now produce a warning, not a fatal error, unless --posix or --traditional are used (see section Directories on the Command Line).
  • The gawk internals were rewritten, bringing the dgawk debugger and possibly improved performance (see section Debugging awk Programs).
  • Per the GNU Coding Standards, dynamic extensions must now define a global symbol indicating that they are GPL-compatible (see section Extension Licensing).
  • In POSIX mode, string comparisons use strcoll() / wcscoll() (see section String Comparison Based on Locale Collating Order).
  • The option for raw sockets was removed, since it was never implemented (see section Using gawk for Network Programming).
  • Ranges of the form ‘[d-h]’ are treated as if they were in the C locale, no matter what kind of regexp is being used, and even if --posix (see section Regexp Ranges and Locales: A Long Sad Story).
  • Support was removed for the following systems:


Version 4.1 of gawk introduced the following features:

  • Three new arrays: SYMTAB, FUNCTAB, and PROCINFO["identifiers"] (see section Built-in Variables That Convey Information).
  • The three executables gawk, pgawk, and dgawk, were merged into one, named just gawk. As a result the command-line options changed.
  • Command-line option changes (see section Command-Line Options): - The -D option invokes the debugger. - The -i and --include options load awk library files. - The -l and --load options load compiled dynamic extensions. - The -M and --bignum options enable MPFR. - The -o option only does pretty-printing. - The -p option is used for profiling. - The -R option was removed.
  • Support for high precision arithmetic with MPFR (see section Arithmetic and Arbitrary-Precision Arithmetic with gawk).
  • The and(), or() and xor() functions changed to allow any number of arguments, with a minimum of two (see section Bit-Manipulation Functions).
  • The dynamic extension interface was completely redone (see section Writing Extensions for gawk).
  • Redirected getline became allowed inside BEGINFILE and ENDFILE (see section The BEGINFILE and ENDFILE Special Patterns).
  • The where command was added to the debugger (see section Working with the Stack).
  • Support for Ultrix was removed.

Version 4.2 of gawk introduced the following changes:

  • Changes to ENVIRON are reflected into gawk’s environment and that of programs that it runs. See section Built-in Variables That Convey Information.
  • FIELDWIDTHS was enhanced to allow skipping characters before assigning a value to a field (see section Defining Fields by Content).
  • The PROCINFO["argv"] array. See section Built-in Variables That Convey Information.
  • The maximum number of hexadecimal digits in ‘\x’ escapes is now two. See section Escape Sequences.
  • Strongly typed regexp constants of the form ‘@/…/’ (see section Strongly Typed Regexp Constants).
  • The bitwise functions changed, making negative arguments into a fatal error (see section Bit-Manipulation Functions).
  • The mktime() function now accepts an optional second argument (see section Time Functions).
  • The typeof() function (see section Getting Type Information).
  • Optimizations are enabled by default. Use -s / --no-optimize to disable optimizations.
  • For many years, POSIX specified that default field splitting only allowed spaces and tabs to separate fields, and this was how gawk behaved with --posix. As of 2013, the standard restored historical behavior, and now default field splitting with --posix also allows newlines to separate fields.
  • Nonfatal output with print and printf. See section Enabling Nonfatal Output.
  • Retryable I/O via PROCINFO[input-file, "RETRY"]; (see section Retrying Reads After Certain Input Errors).
  • Changes to the pretty-printer (see section Profiling Your awk Programs): - The --pretty-print option no longer runs the awk program too. - Comments in the source program are preserved and placed into the output file. - Explicit parentheses for expressions in the input are preserved in the generated output.
  • Improvements to the extension API (see section Writing Extensions for gawk): - The get_file() function to access open redirections. - The nonfatal() function for generating nonfatal error messages. - Support for GMP and MPFR values. - Input parsers can now override the default field parsing mechanism by specifying explicit locations.
  • Shell startup files are supplied with the distribution and installed by ‘make install’ (see section Shell Startup Files).
  • The igawk program and its manual page are no longer installed when gawk is built. See section An Easy Way to Use Library Functions.
  • Support for MirBSD was removed.
  • Support for GNU/Linux on Alpha was removed.

Version 5.0 added the following features:

  • The PROCINFO["platform"] array element, which allows you to write code that takes the operating system / platform into account.

Version 5.1 was created to release gawk with a correct major version number for the API. This was overlooked for version 5.0, unfortunately. It added the following features:

  • The index for this manual was completely reworked.
  • Support was added for MSYS2.