Extension New Mechanism Goals (The GNU Awk User’s Guide)
C.5.2 Goals For A New Mechanism
Some goals for the new API were:
- The API should be independent of
gawkinternals. Changes in
gawkinternals should not be visible to the writer of an extension function.
- The API should provide binary compatibility across
gawkreleases as long as the API itself does not change.
- The API should enable extensions written in C or C++ to have roughly the same “appearance” to
awk-level code as
awkfunctions do. This means that extensions should have: - The ability to access function parameters. - The ability to turn an undefined parameter into an array (call by reference). - The ability to create, access and update global variables. - Easy access to all the elements of an array at once (“array flattening”) in order to loop over all the element in an easy fashion for C code. - The ability to create arrays (including gawk’s true arrays of arrays).
Some additional important goals were:
- The API should use only features in ISO C 90, so that extensions can be written using the widest range of C and C++ compilers. The header should include the appropriate ‘
#ifdef __cplusplus’ and ‘
extern "C"’ magic so that a C++ compiler could be used. (If using C++, the runtime system has to be smart enough to call any constructors and destructors, as
gawkis a C program. As of this writing, this has not been tested.)
- The API mechanism should not require access to
gawk’s symbols122 by the compile-time or dynamic linker, in order to enable creation of extensions that also work on MS-Windows.
During development, it became clear that there were other features that should be available to extensions, which were also subsequently provided:
- Extensions should have the ability to hook into
gawk’s I/O redirection mechanism. In particular, the
xgawkdevelopers provided a so-called “open hook” to take over reading records. During development, this was generalized to allow extensions to hook into input processing, output processing, and two-way I/O.
- An extension should be able to provide a “call back” function to perform cleanup actions when
- An extension should be able to provide a version string so that
--versionoption can provide information about extensions as well.
The requirement to avoid access to
gawk’s symbols is, at first glance, a difficult one to meet.
One design, apparently used by Perl and Ruby and maybe others, would be to make the mainline
gawk code into a library, with the
gawk utility a small C
main() function linked against the library.
This seemed like the tail wagging the dog, complicating build and installation and making a simple copy of the
gawk executable from one system to another (or one place to another on the same system!) into a chancy operation.
Pat Rankin suggested the solution that was adopted. See section How It Works at a High Level, for the details.
The symbols are the variables and functions defined inside
gawk. Access to these symbols by code external to
gawk loaded dynamically at runtime is problematic on MS-Windows.