Gawk/Feature-History
Next: Common Extensions, Previous: POSIX/GNU, Up: Language History [Contents][Index]
A.6 History of gawk
Features
This section describes the features in gawk
over and above those in POSIX awk
,
in the order they were added to gawk
.
Version 2.10 of gawk
introduced the following features:
- The
AWKPATH
environment variable for specifying a path search for the-f
command-line option (see section Command-Line Options). - The
IGNORECASE
variable and its effects (see section Case Sensitivity in Matching). - The
/dev/stdin
,/dev/stdout
,/dev/stderr
and/dev/fd/N
special file names (see section Special File names ingawk
).
Version 2.13 of gawk
introduced the following features:
- The
FIELDWIDTHS
variable and its effects (see section Reading Fixed-Width Data). - The
systime()
andstrftime()
built-in functions for obtaining and printing timestamps (see section Time Functions). - Additional command-line options (see section Command-Line Options):
- - The
-W lint
option to provide error and portability checking for both the source code and at runtime. - - The
-W compat
option to turn off the GNU extensions. - - The
-W posix
option for full POSIX compliance.
- - The
Version 2.14 of gawk
introduced the following feature:
- The
next file
statement for skipping to the next data file (see section Thenextfile
Statement).
Version 2.15 of gawk
introduced the following features:
- New variables (see section Predefined Variables):
- -
ARGIND
, which tracks the movement ofFILENAME
throughARGV
. - -
ERRNO
, which contains the system error message whengetline
returns -1 orclose()
fails.
- -
- The
/dev/pid
,/dev/ppid
,/dev/pgrpid
, and/dev/user
special file names. These have since been removed. - The ability to delete all of an array at once with ‘
delete array
’ (see section Thedelete
Statement). - Command-line option changes (see section Command-Line Options):
- - The ability to use GNU-style long-named options that start with
--
. - - The
--source
option for mixing command-line and library-file source code.
- - The ability to use GNU-style long-named options that start with
Version 3.0 of gawk
introduced the following features:
- New or changed variables:
- -
IGNORECASE
changed, now applying to string comparison as well as regexp operations (see section Case Sensitivity in Matching). - -
RT
, which contains the input text that matchedRS
(see section How Input Is Split into Records).
- -
- Full support for both POSIX and GNU regexps (see section Regular Expressions).
- The
gensub()
function for more powerful text manipulation (see section String-Manipulation Functions). - The
strftime()
function acquired a default time format, allowing it to be called with no arguments (see section Time Functions). - The ability for
FS
and for the third argument tosplit()
to be null strings (see section Making Each Character a Separate Field). - The ability for
RS
to be a regexp (see section How Input Is Split into Records). - The
next file
statement becamenextfile
(see section Thenextfile
Statement). - The
fflush()
function from BWKawk
(then at Bell Laboratories; see section Input/Output Functions). - New command-line options:
- - The
--lint-old
option to warn about constructs that are not available in the original Version 7 Unix version ofawk
(see section Major Changes Between V7 and SVR3.1). - - The
-m
option from BWKawk
. (Brian was still at Bell Laboratories at the time.) This was later removed from both hisawk
and fromgawk
. - - The
--re-interval
option to provide interval expressions in regexps (see section Regular Expression Operators). - - The
--traditional
option was added as a better name for--compat
(see section Command-Line Options).
- - The
- The use of GNU Autoconf to control the configuration process (see section Compiling
gawk
for Unix-Like Systems). - Amiga support. This has since been removed.
Version 3.1 of gawk
introduced the following features:
- New variables (see section Predefined Variables):
- -
BINMODE
, for non-POSIX systems, which allows binary I/O for input and/or output files (see section Usinggawk
on PC Operating Systems). - -
LINT
, which dynamically controls lint warnings. - -
PROCINFO
, an array for providing process-related information. - -
TEXTDOMAIN
, for setting an application’s internationalization text domain (see section Internationalization withgawk
).
- -
- The ability to use octal and hexadecimal constants in
awk
program source code (see section Octal and Hexadecimal Numbers). - The ‘
|&
’ operator for two-way I/O to a coprocess (see section Two-Way Communications with Another Process). - The
/inet
special files for TCP/IP networking using ‘|&
’ (see section Usinggawk
for Network Programming). - The optional second argument to
close()
that allows closing one end of a two-way pipe to a coprocess (see section Two-Way Communications with Another Process). - The optional third argument to the
match()
function for capturing text-matching subexpressions within a regexp (see section String-Manipulation Functions). - Positional specifiers in
printf
formats for making translations easier (see section Rearrangingprintf
Arguments). - A number of new built-in functions:
- - The
asort()
andasorti()
functions for sorting arrays (see section Controlling Array Traversal and Array Sorting). - - The
bindtextdomain()
,dcgettext()
anddcngettext()
functions for internationalization (see section Internationalizingawk
Programs). - - The
extension()
function and the ability to add new built-in functions dynamically (see section Writing Extensions forgawk
). - - The
mktime()
function for creating timestamps (see section Time Functions). - - The
and()
,or()
,xor()
,compl()
,lshift()
,rshift()
, andstrtonum()
functions (see section Bit-Manipulation Functions).
- - The
- The support for ‘
next file
’ as two words was removed completely (see section Thenextfile
Statement). - Additional command-line options (see section Command-Line Options):
- - The
--dump-variables
option to print a list of all global variables. - - The
--exec
option, for use in CGI scripts. - - The
--gen-po
command-line option and the use of a leading underscore to mark strings that should be translated (see section Extracting Marked Strings). - - The
--non-decimal-data
option to allow non-decimal input data (see section Allowing Nondecimal Input Data). - - The
--profile
option andpgawk
, the profiling version ofgawk
, for producing execution profiles ofawk
programs (see section Profiling Yourawk
Programs). - - The
--use-lc-numeric
option to forcegawk
to use the locale’s decimal point for parsing input data (see section Conversion of Strings and Numbers).
- - The
- The use of GNU Automake to help in standardizing the configuration process (see section Compiling
gawk
for Unix-Like Systems). - The use of GNU
gettext
forgawk
’s own message output (see sectiongawk
Can Speak Your Language). - BeOS support. This was later removed.
- Tandem support. This was later removed.
- The Atari port became officially unsupported and was later removed entirely.
- The source code changed to use ISO C standard-style function definitions.
- POSIX compliance for
sub()
andgsub()
(see section More about ‘\
’ and ‘&
’ withsub()
,gsub()
, andgensub()
). - The
length()
function was extended to accept an array argument and return the number of elements in the array (see section String-Manipulation Functions). - The
strftime()
function acquired a third argument to enable printing times as UTC (see section Time Functions).
Version 4.0 of gawk
introduced the following features:
- Variable additions:
- -
FPAT
, which allows you to specify a regexp that matches the fields, instead of matching the field separator (see section Defining Fields by Content). - - If
PROCINFO["sorted_in"]
exists, ‘for(iggy in foo)
’ loops sort the indices before looping over them. The value of this element provides control over how the indices are sorted before the loop traversal starts (see section Using Predefined Array Scanning Orders withgawk
). - -
PROCINFO["strftime"]
, which holds the default format forstrftime()
(see section Time Functions).
- -
- The special files
/dev/pid
,/dev/ppid
,/dev/pgrpid
and/dev/user
were removed. - Support for IPv6 was added via the
/inet6
special file./inet4
forces IPv4 and/inet
chooses the system default, which is probably IPv4 (see section Usinggawk
for Network Programming). - The use of ‘
\s
’ and ‘\S
’ escape sequences in regular expressions (see sectiongawk
-Specific Regexp Operators). - Interval expressions became part of default regular expressions (see section Regular Expression Operators).
- POSIX character classes work even with
--traditional
(see section Regular Expression Operators). break
andcontinue
became invalid outside a loop, even with--traditional
(see section Thebreak
Statement, and also see Thecontinue
Statement).fflush()
,nextfile
, and ‘delete array
’ are allowed if--posix
or--traditional
, since they are all now part of POSIX.- An optional third argument to
asort()
andasorti()
, specifying how to sort (see section String-Manipulation Functions). - The behavior of
fflush()
changed to match BWKawk
and for POSIX; now both ‘fflush()
’ and ‘fflush("")
’ flush all open output redirections (see section Input/Output Functions). - The
isarray()
function which distinguishes if an item is an array or not, to make it possible to traverse arrays of arrays (see section Getting Type Information). - The
patsplit()
function which gives the same capability asFPAT
, for splitting (see section String-Manipulation Functions). - An optional fourth argument to the
split()
function, which is an array to hold the values of the separators (see section String-Manipulation Functions). - Arrays of arrays (see section Arrays of Arrays).
- The
BEGINFILE
andENDFILE
special patterns (see section TheBEGINFILE
andENDFILE
Special Patterns). - Indirect function calls (see section Indirect Function Calls).
switch
/case
are enabled by default (see section Theswitch
Statement).- Command-line option changes (see section Command-Line Options):
- - The
-b
and--characters-as-bytes
options which preventgawk
from treating input as a multibyte string. - - The redundant
--compat
,--copyleft
, and--usage
long options were removed. - - The
--gen-po
option was finally renamed to the correct--gen-pot
. - - The
--sandbox
option which disables certain features. - - All long options acquired corresponding short options, for use in ‘
#!
’ scripts.
- - The
- Directories named on the command line now produce a warning, not a fatal error, unless
--posix
or--traditional
are used (see section Directories on the Command Line). - The
gawk
internals were rewritten, bringing thedgawk
debugger and possibly improved performance (see section Debuggingawk
Programs). - Per the GNU Coding Standards, dynamic extensions must now define a global symbol indicating that they are GPL-compatible (see section Extension Licensing).
- In POSIX mode, string comparisons use
strcoll()
/wcscoll()
(see section String Comparison Based on Locale Collating Order). - The option for raw sockets was removed, since it was never implemented (see section Using
gawk
for Network Programming). - Ranges of the form ‘
[d-h]
’ are treated as if they were in the C locale, no matter what kind of regexp is being used, and even if--posix
(see section Regexp Ranges and Locales: A Long Sad Story). - Support was removed for the following systems:
- - Atari
- - Amiga
- - BeOS
- - Cray
- - MIPS RiscOS
- - MS-DOS with the Microsoft Compiler
- - MS-Windows with the Microsoft Compiler
- - NeXT
- - SunOS 3.x, Sun 386 (Road Runner)
- - Tandem (non-POSIX)
- - Prestandard VAX C compiler for VAX/VMS
Version 4.1 of gawk
introduced the following features:
- Three new arrays:
SYMTAB
,FUNCTAB
, andPROCINFO["identifiers"]
(see section Built-in Variables That Convey Information). - The three executables
gawk
,pgawk
, anddgawk
, were merged into one, named justgawk
. As a result the command-line options changed. - Command-line option changes (see section Command-Line Options):
- - The
-D
option invokes the debugger. - - The
-i
and--include
options loadawk
library files. - - The
-l
and--load
options load compiled dynamic extensions. - - The
-M
and--bignum
options enable MPFR. - - The
-o
option only does pretty-printing. - - The
-p
option is used for profiling. - - The
-R
option was removed.
- - The
- Support for high precision arithmetic with MPFR (see section Arithmetic and Arbitrary-Precision Arithmetic with
gawk
). - The
and()
,or()
andxor()
functions changed to allow any number of arguments, with a minimum of two (see section Bit-Manipulation Functions). - The dynamic extension interface was completely redone (see section Writing Extensions for
gawk
). - Redirected
getline
became allowed insideBEGINFILE
andENDFILE
(see section TheBEGINFILE
andENDFILE
Special Patterns). - The
where
command was added to the debugger (see section Working with the Stack). - Support for Ultrix was removed.
Version 4.2 of gawk
introduced the following changes:
- Changes to
ENVIRON
are reflected intogawk
’s environment and that of programs that it runs. See section Built-in Variables That Convey Information. FIELDWIDTHS
was enhanced to allow skipping characters before assigning a value to a field (see section Defining Fields by Content).- The
PROCINFO["argv"]
array. See section Built-in Variables That Convey Information. - The maximum number of hexadecimal digits in ‘
\x
’ escapes is now two. See section Escape Sequences. - Strongly typed regexp constants of the form ‘
@/…/
’ (see section Strongly Typed Regexp Constants). - The bitwise functions changed, making negative arguments into a fatal error (see section Bit-Manipulation Functions).
- The
mktime()
function now accepts an optional second argument (see section Time Functions). - The
typeof()
function (see section Getting Type Information). - Optimizations are enabled by default. Use
-s
/--no-optimize
to disable optimizations. - For many years, POSIX specified that default field splitting only allowed spaces and tabs to separate fields, and this was how
gawk
behaved with--posix
. As of 2013, the standard restored historical behavior, and now default field splitting with--posix
also allows newlines to separate fields. - Nonfatal output with
print
andprintf
. See section Enabling Nonfatal Output. - Retryable I/O via
PROCINFO[input-file, "RETRY"]
; (see section Retrying Reads After Certain Input Errors). - Changes to the pretty-printer (see section Profiling Your
awk
Programs):- - The
--pretty-print
option no longer runs theawk
program too. - - Comments in the source program are preserved and placed into the output file.
- - Explicit parentheses for expressions in the input are preserved in the generated output.
- - The
- Improvements to the extension API (see section Writing Extensions for
gawk
):- - The
get_file()
function to access open redirections. - - The
nonfatal()
function for generating nonfatal error messages. - - Support for GMP and MPFR values.
- - Input parsers can now override the default field parsing mechanism by specifying explicit locations.
- - The
- Shell startup files are supplied with the distribution and installed by ‘
make install
’ (see section Shell Startup Files). - The
igawk
program and its manual page are no longer installed whengawk
is built. See section An Easy Way to Use Library Functions. - Support for MirBSD was removed.
- Support for GNU/Linux on Alpha was removed.
Version 5.0 added the following features:
- The
PROCINFO["platform"]
array element, which allows you to write code that takes the operating system / platform into account.
Version 5.1 was created to release gawk
with a correct
major version number for the API. This was overlooked for version 5.0,
unfortunately. It added the following features:
- The index for this manual was completely reworked.
- Support was added for MSYS2.
Next: Common Extensions, Previous: POSIX/GNU, Up: Language History [Contents][Index]