Gawk/Extension-New-Mechanism-Goals

From Get docs

C.5.2 Goals For A New Mechanism

Some goals for the new API were:

  • The API should be independent of gawk internals. Changes in gawk internals should not be visible to the writer of an extension function.
  • The API should provide binary compatibility across gawk releases as long as the API itself does not change.
  • The API should enable extensions written in C or C++ to have roughly the same “appearance” to awk-level code as awk functions do. This means that extensions should have:
    • - The ability to access function parameters.
    • - The ability to turn an undefined parameter into an array (call by reference).
    • - The ability to create, access and update global variables.
    • - Easy access to all the elements of an array at once (“array flattening”) in order to loop over all the element in an easy fashion for C code.
    • - The ability to create arrays (including gawk’s true arrays of arrays).

Some additional important goals were:

  • The API should use only features in ISO C 90, so that extensions can be written using the widest range of C and C++ compilers. The header should include the appropriate ‘#ifdef __cplusplus’ and ‘extern "C"’ magic so that a C++ compiler could be used. (If using C++, the runtime system has to be smart enough to call any constructors and destructors, as gawk is a C program. As of this writing, this has not been tested.)
  • The API mechanism should not require access to gawk’s symbols122 by the compile-time or dynamic linker, in order to enable creation of extensions that also work on MS-Windows.

During development, it became clear that there were other features that should be available to extensions, which were also subsequently provided:

  • Extensions should have the ability to hook into gawk’s I/O redirection mechanism. In particular, the xgawk developers provided a so-called “open hook” to take over reading records. During development, this was generalized to allow extensions to hook into input processing, output processing, and two-way I/O.
  • An extension should be able to provide a “call back” function to perform cleanup actions when gawk exits.
  • An extension should be able to provide a version string so that gawk’s --version option can provide information about extensions as well.

The requirement to avoid access to gawk’s symbols is, at first glance, a difficult one to meet.

One design, apparently used by Perl and Ruby and maybe others, would be to make the mainline gawk code into a library, with the gawk utility a small C main() function linked against the library.

This seemed like the tail wagging the dog, complicating build and installation and making a simple copy of the gawk executable from one system to another (or one place to another on the same system!) into a chancy operation.

Pat Rankin suggested the solution that was adopted. See section How It Works at a High Level, for the details.

Footnotes

(122)

The symbols are the variables and functions defined inside gawk. Access to these symbols by code external to gawk loaded dynamically at runtime is problematic on MS-Windows.