Gawk/Index
The GNU Awk User’s Guide
General Introduction
This file documents awk
, a program that you can use to select
particular records in a file and perform operations upon them.
Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2020 Free Software Foundation, Inc.
This is Edition 5.1 of GAWK: Effective AWK Programming: A User’s Guide for GNU Awk, for the 5.1.0 (or later) version of the GNU implementation of AWK.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, with the Front-Cover Texts being “A GNU Manual”, and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License”.
- The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual.”
Some nice words about this Web page.
More nice words.
• Preface
What this Web page is about; brief history and acknowledgments.
A basic introduction to using
awk
. How to run an awk
program. Command-line syntax.
How to run gawk
.
• Regexp
All about matching things using regular expressions.
How to read files and manipulate fields.
• Printing
How to print using awk
. Describes
the print
and printf
statements. Also describes redirection of
output.
Expressions are the basic building blocks of statements.
Overviews of patterns and actions.
• Arrays
The description and use of arrays. Also includes array-oriented control statements.
Built-in and user-defined functions.
A Library of awk
Functions.
Many awk
programs with complete
explanations.
Stuff for advanced users, specific to
gawk
.
Getting gawk
to speak your
language.
• Debugger
The gawk
debugger.
How namespaces work in gawk
.
• Arbitrary Precision Arithmetic
Arbitrary precision arithmetic with
gawk
.
Adding new built-in functions to
gawk
.
The evolution of the awk
language.
Installing gawk
under various
operating systems.
• Notes
Notes about adding things to gawk
and possible future work.
A very quick introduction to programming concepts.
• Glossary
An explanation of some unfamiliar terms.
• Copying
Your right to copy and distribute
gawk
.
• GNU Free Documentation License
The license for this Web page.
• Index
Concept and Variable Index.
• History
The history of gawk
and
awk
.
• Names
What name to use to find
awk
.
Using this Web page. Includes sample input files that you can use.
Typographical Conventions.
Brief history of the GNU project and this Web page.
Helping to save the world.
Acknowledgments.
How to run gawk
programs;
includes command-line syntax.
• One-shot
Running a short throwaway
awk
program.
Using no input files (input from the keyboard instead).
• Long
Putting permanent awk
programs in files.
Making self-contained awk
programs.
• Comments
Adding documentation to gawk
programs.
• Quoting
More discussion of shell quoting issues.
Quoting in Windows Batch Files.
Sample data files for use in the
awk
programs illustrated in
this Web page.
A very simple example.
A less simple one-line example using two rules.
A more complex example.
Subdividing or combining statements into lines.
Other Features of awk
.
• When
When to use gawk
and when to
use other things.
Summary of the introduction.
How to run awk
.
• Options
Command-line options and their meanings.
Input file names and variable assignments.
How to specify standard input with other files.
The environment variables
gawk
uses.
Searching directories for
awk
programs.
Searching directories for
awk
shared libraries.
The environment variables.
gawk
’s exit status.
Including other files into your program.
Loading shared libraries into your program.
• Obsolete
Obsolete Options and/or features.
Undocumented Options and Features.
Invocation summary.
How to Use Regular Expressions.
How to write nonprinting characters.
Regular Expression Operators.
The actual details.
Notes on interval expressions.
What can go between ‘[...]
’.
How much text matches.
Using Dynamic Regexps.
Operators specific to GNU software.
How to do case-insensitive matching.
Regular expressions summary.
• Records
Controlling how data is split into records.
How standard awk
splits
records.
How gawk
splits records.
• Fields
An introduction to fields.
Nonconstant Field Numbers.
Changing the Contents of a Field.
The field separator and how to change it.
How fields are normally separated.
Using regexps as the field separator.
Making each character a separate field.
• Command Line Field Separator
Setting FS
from the command
line.
Making the full line be a single field.
Some final points and a summary table.
Reading constant width data.
Processing fixed-width data.
Skipping intervening fields.
Capturing optional trailing data.
Field values with fixed-width data.
Defining Fields By Content
• More CSV
More on CSV files.
Checking how gawk
is
splitting records.
Reading multiline records.
• Getline
Reading files under explicit program
control using the getline
function.
Using getline
with no
arguments.
Using getline
into a variable.
Using getline
from a file.
Using getline
into a variable
from a file.
Using getline
from a pipe.
Using getline
into a variable
from a pipe.
Using getline
from a coprocess.
Using getline
into a variable
from a coprocess.
Important things to know about
getline
.
Summary of getline
Variants.
Reading input with a timeout.
Retrying input after certain errors.
What happens if you put a directory on the command line.
Input summary.
Exercises.
The print
statement.
Simple examples of print
statements.
The output separators and how to change them.
• OFMT
Controlling Numeric Output With
print
.
• Printf
The printf
statement.
Syntax of the printf
statement.
Format-control letters.
Format-specification modifiers.
Several examples.
How to redirect output to multiple files and pipes.
Special files for I/O.
File name interpretation in
gawk
. gawk
allows
access to inherited file descriptors.
Accessing other open files with
gawk
.
Special files for network communications.
Things to watch out for.
Closing Input and Output Files and Pipes.
• Nonfatal
Enabling Nonfatal Output.
Output summary.
Exercises.
• Values
Constants, Variables, and Regular Expressions.
String, numeric and regexp constants.
Numeric and string constants.
What are octal and hex numbers.
Regular Expression constants.
When and how to use a regexp constant.
Regexp constants in standard
awk
.
Strongly typed regexp constants.
Variables give names to values for later use.
Using variables in your programs.
Setting variables on the command line and a summary of command-line syntax. This is an advanced method of input.
The conversion of strings to numbers and vice versa.
How awk
Converts Between
Strings And Numbers.
• Locale influences conversions
How the locale may affect conversions.
gawk
’s operators.
Arithmetic operations (‘+
’,
‘-
’, etc.)
Concatenating strings.
Changing the value of a variable or a field.
Incrementing the numeric value of a variable.
Testing for true and false.
What is “true” and what is “false”.
How variables acquire types and how
this affects comparison of numbers and
strings with ‘<
’, etc.
String type versus numeric type.
The comparison operators.
String comparison with POSIX rules.
Combining comparison expressions using
boolean operators ‘||
’ (“or”),
‘&&
’ (“and”) and ‘!
’
(“not”).
Conditional expressions select between two subexpressions under control of a third subexpression.
A function call is an expression.
How various operators nest.
• Locales
How the locale affects things.
Expressions summary.
What goes into a pattern.
Using regexps as patterns.
Any expression can be used as a pattern.
• Ranges
Pairs of patterns specify record ranges.
Specifying initialization and cleanup rules.
How and why to use BEGIN/END rules.
I/O issues in BEGIN/END rules.
Two special patterns for advanced control.
• Empty
The empty pattern, which matches every record.
How to use shell variables with
awk
.
What goes into an action.
Describes the various control statements in detail.
Conditionally execute some
awk
statements.
Loop until some condition is satisfied.
Do specified action while looping until some condition is satisfied.
Another looping statement, that provides initialization and increment clauses.
Switch/case evaluation for conditional execution of statements based on a value.
Immediately exit the innermost enclosing loop.
Skip to the end of the innermost enclosing loop.
Stop processing the current input record.
Stop processing the current file.
Stop execution of awk
.
Summarizes the predefined variables.
Built-in variables that you change to
control awk
.
• Auto-set
Built-in variables where awk
gives you information.
Ways to use ARGC
and
ARGV
.
Patterns and Actions summary.
The basics of arrays.
Introduction to Arrays
How to examine one element of an array.
How to change an element of an array.
Basic Example of an Array
A variation of the for
statement. It loops through the
indices of an array’s existing
elements.
Controlling the order in which arrays are scanned.
How to use numbers as subscripts in
awk
.
Using Uninitialized variables as subscripts.
• Delete
The delete
statement removes an
element from an array.
Emulating multidimensional arrays in
awk
.
Scanning multidimensional arrays.
True multidimensional arrays.
Summary of arrays.
• Built-in
Summarizes the built-in functions.
How to call built-in functions.
Functions that work with numbers,
including int()
, sin()
and rand()
.
Functions for string manipulation,
such as split()
, match()
and sprintf()
.
More than you want to know about
‘\
’ and ‘&
’ with
sub()
, gsub()
, and
gensub()
.
Functions for files and shell commands.
Functions for dealing with timestamps.
Functions for bitwise operations.
Functions for type information.
Functions for string translation.
Describes User-defined functions in detail.
How to write definitions and what they mean.
An example function definition and what it does.
Calling user-defined functions.
Don’t use spaces.
Controlling variable scope.
Passing parameters.
Other points to know about functions.
Specifying the value a function returns.
How variable types can change at runtime.
Choosing the function to call at runtime.
Summary of functions.
How to best name private global variables in library functions.
Functions that are of general use.
A replacement for the built-in
strtonum()
function.
A function for assertions in
awk
programs.
A function for rounding if
sprintf()
does not do it
correctly.
The Cliff Random Number Generator.
Functions for using characters as numbers and vice versa.
A function to join an array into a string.
A function to get formatted times.
A function to read an entire file at once.
A function to quote strings for the shell.
Functions for managing command-line data files.
A function for handling data file transitions.
A function for rereading the current file.
Checking that data files are readable.
Checking for zero-length files.
Treating assignments as file names.
A function for processing command-line arguments.
Functions for getting user information.
Functions for getting group information.
A function to walk arrays of arrays.
Summary of library functions.
Exercises.
How to run these examples.
• Clones
Clones of common utilities.
The cut
utility.
The egrep
utility.
The id
utility.
The split
utility.
The tee
utility.
The uniq
utility.
The wc
utility.
Some interesting awk
programs.
Finding duplicated words in a document.
An alarm clock.
A program similar to the tr
utility.
Printing mailing labels.
A program to produce a word usage count.
Eliminating duplicate entries from a history file.
Pulling out programs from Texinfo source files.
A Simple Stream Editor.
A wrapper for awk
that
includes files.
Finding anagrams from a dictionary.
People do amazing things with too much time on their hands.
Summary of programs.
Exercises.
Allowing nondecimal input data.
Facilities for controlling array traversal and sorting arrays.
How to use PROCINFO["sorted_in"].
How to use asort()
and
asorti()
.
Two-way communications with another process.
Using gawk
for network
programming.
Profiling your awk
programs.
Summary of advanced features.
Internationalization and Localization.
How GNU gettext
works.
Features for the programmer.
Features for the translator.
Extracting marked strings.
Rearranging printf
arguments.
awk
-level portability
issues.
A simple i18n example.
gawk
is also
internationalized.
Summary of I18N stuff.
Introduction to gawk
debugger.
Debugging in General.
Additional Debugging Concepts.
Awk Debugging.
Sample debugging session.
How to Start the Debugger.
Finding the Bug.
Main debugger commands.
Control of Breakpoints.
Control of Execution.
Viewing and Changing Data.
Dealing with the Stack.
Obtaining Information about the Program and the Debugger State.
• Miscellaneous Debugger Commands
Miscellaneous Commands.
Readline support.
Limitations and future plans.
Debugging summary.
The global namespace in standard
awk
.
How to qualify names with a namespace.
The default namespace.
How to change the namespace.
Namespace and Component Naming Rules.
How names are stored internally.
An example of code using a namespace.
Namespaces and other gawk
features.
Summarizing namespaces.
A quick intro to computer math.
Defining terms used.
The MPFR features in gawk
.
Things to know.
Floating point math is not exact.
Numbers are not exactly represented.
How to compare floating point values.
Errors get bigger as they go.
Getting more accuracy takes some work.
Add digits and round.
How to set the precision.
How to set the rounding mode.
• Arbitrary Precision Integers
Arbitrary Precision Integer Arithmetic
with gawk
.
How to check if MPFR is available.
• POSIX Floating Point Problems
Standards Versus Existing Practice.
Summary of floating point discussion.
What is an extension.
A note about licensing.
An outline of how it works.
A full description of the API.
• Extension API Functions Introduction
Introduction to the API functions.
The data types.
Functions for allocating memory.
Functions for creating values.
Functions to register things with
gawk
.
Registering extension functions.
Registering an exit callback.
Registering a version string.
Registering an input parser.
Registering an output wrapper.
Registering a two-way processor.
Functions for printing messages.
Functions for updating ERRNO
.
How to get a value.
Functions for accessing parameters.
Functions for accessing global variables.
Accessing variables by name.
Accessing variables by “cookie”.
Creating and using cached values.
Functions for working with arrays.
Data types for working with arrays.
Functions for working with arrays.
How to flatten arrays.
How to create and populate arrays.
How to access and manipulate redirections.
Variables provided by the API.
API Version information.
• Extension GMP/MPFR Versioning
Version information about GMP and MPFR.
• Extension API Informational Variables
Variables providing information about
gawk
’s invocation.
Boilerplate code for using the API.
Changes from V1 of the API.
How gawk
finds compiled
extensions.
Example C code for an extension.
What the new functions will do.
The code for internal file operations.
How to use an external extension.
The sample extensions that ship with
gawk
.
• Extension Sample File Functions
The file functions sample.
An interface to fnmatch()
.
An interface to fork()
and
other process functions.
Enabling in-place file editing.
Character to value to character conversions.
An interface to readdir()
.
Reversing output sample output wrapper.
Reversing data sample two-way processor.
• Extension Sample Read write array
Serializing an array to a file.
Reading an entire file into a string.
An interface to gettimeofday()
and sleep()
.
Tests for the API.
The gawkextlib
project.
Extension summary.
Exercises.
The major changes between V7 and System V Release 3.1.
• SVR4
Minor changes between System V Releases 3.1 and 4.
• POSIX
New features from the POSIX standard.
• BTL
New features from Brian Kernighan’s
version of awk
.
The extensions in gawk
not
in POSIX awk
.
The history of the features in
gawk
.
Common Extensions Summary.
How locales used to affect regexp ranges.
The major contributors to
gawk
.
History summary.
What is in the gawk
distribution.
• Getting
How to get the distribution.
How to extract the distribution.
What is in the distribution.
Installing gawk
under
various versions of Unix.
Compiling gawk
under Unix.
Shell convenience functions.
• Additional Configuration Options
Other compile-time options.
How it’s all supposed to work.
Installation on Other Operating Systems.
Installing and Compiling
gawk
on Microsoft Windows.
Installing a prepared distribution.
Compiling gawk
for
Windows32.
• PC Using
Running gawk
on Windows32.
• Cygwin
Building and running gawk
for Cygwin.
• MSYS
Using gawk
In The MSYS
Environment.
Installing gawk
on VMS.
How to compile gawk
under
VMS.
Compiling gawk
dynamic
extensions on VMS.
How to install gawk
under
VMS.
How to run gawk
under VMS.
• VMS GNV
The VMS GNV Project.
An old version comes with some VMS systems.
• Bugs
Reporting Problems and Bugs.
Where to send reports to.
• Usenet
Where not to send reports to.
Maintainers of non-*nix ports.
Other freely available awk
implementations.
Summary of installation.
How to disable certain gawk
extensions.
Making Additions To gawk
.
Accessing the Git repository.
Adding code to the main body of
gawk
.
Porting gawk
to a new
operating system.
Why derived files are kept in the Git repository.
New features that may be implemented one day.
Some limitations of the implementation.
Design notes about the extension API.
Problems with the old mechanism.
• Extension New Mechanism Goals
Goals for the new mechanism.
• Extension Other Design Decisions
Some other design decisions.
Some room for future growth.
Summary of implementation notes.
The high level view.
A very quick intro to data types.
Short Table of Contents
- ** Foreword to the Third Edition
- Part I:The
awk
Language - Part II:Problem Solving with
awk
- Part III:Moving Beyond Standard
awk
withgawk
- Part IV:Appendices
Table of Contents
- ** Foreword to the Third Edition
- Part I:The
awk
Language- 1 Getting Started with
awk
- 2 Running
awk
andgawk
- 2.1 Invoking
awk
- 2.2 Command-Line Options
- 2.3 Other Command-Line Arguments
- 2.4 Naming Standard Input
- 2.5 The Environment Variables
gawk
Uses - 2.6
gawk
’s Exit Status - 2.7 Including Other Files into Your Program
- 2.8 Loading Dynamic Extensions into Your Program
- 2.9 Obsolete Options and/or Features
- 2.10 Undocumented Options and Features
- 2.11 Summary
- 2.1 Invoking
- 3 Regular Expressions
- 4 Reading Input Files
- 4.1 How Input Is Split into Records
- 4.2 Examining Fields
- 4.3 Nonconstant Field Numbers
- 4.4 Changing the Contents of a Field
- 4.5 Specifying How Fields Are Separated
- 4.6 Reading Fixed-Width Data
- 4.7 Defining Fields by Content
- 4.8 Checking How
gawk
Is Splitting Records - 4.9 Multiple-Line Records
- 4.10 Explicit Input with
getline
- 4.10.1 Using
getline
with No Arguments - 4.10.2 Using
getline
into a Variable - 4.10.3 Using
getline
from a File - 4.10.4 Using
getline
into a Variable from a File - 4.10.5 Using
getline
from a Pipe - 4.10.6 Using
getline
into a Variable from a Pipe - 4.10.7 Using
getline
from a Coprocess - 4.10.8 Using
getline
into a Variable from a Coprocess - 4.10.9 Points to Remember About
getline
- 4.10.10 Summary of
getline
Variants
- 4.10.1 Using
- 4.11 Reading Input with a Timeout
- 4.12 Retrying Reads After Certain Input Errors
- 4.13 Directories on the Command Line
- 4.14 Summary
- 4.15 Exercises
- 5 Printing Output
- 5.1 The
print
Statement - 5.2
print
Statement Examples - 5.3 Output Separators
- 5.4 Controlling Numeric Output with
print
- 5.5 Using
printf
Statements for Fancier Printing - 5.6 Redirecting Output of
print
andprintf
- 5.7 Special Files for Standard Preopened Data Streams
- 5.8 Special File names in
gawk
- 5.9 Closing Input and Output Redirections
- 5.10 Enabling Nonfatal Output
- 5.11 Summary
- 5.12 Exercises
- 5.1 The
- 6 Expressions
- 6.1 Constants, Variables, and Conversions
- 6.2 Operators: Doing Something with Values
- 6.3 Truth Values and Conditions
- 6.4 Function Calls
- 6.5 Operator Precedence (How Operators Nest)
- 6.6 Where You Are Makes a Difference
- 6.7 Summary
- 7 Patterns, Actions, and Variables
- 7.1 Pattern Elements
- 7.2 Using Shell Variables in Programs
- 7.3 Actions
- 7.4 Control Statements in Actions
- 7.5 Predefined Variables
- 7.6 Summary
- 8 Arrays in
awk
- 9 Functions
- 9.1 Built-in Functions
- 9.2 User-Defined Functions
- 9.3 Indirect Function Calls
- 9.4 Summary
- 1 Getting Started with
- Part II:Problem Solving with
awk
- 10 A Library of
awk
Functions- 10.1 Naming Library Function Global Variables
- 10.2 General Programming
- 10.2.1 Converting Strings to Numbers
- 10.2.2 Assertions
- 10.2.3 Rounding Numbers
- 10.2.4 The Cliff Random Number Generator
- 10.2.5 Translating Between Characters and Numbers
- 10.2.6 Merging an Array into a String
- 10.2.7 Managing the Time of Day
- 10.2.8 Reading a Whole File at Once
- 10.2.9 Quoting Strings to Pass to the Shell
- 10.3 Data file Management
- 10.4 Processing Command-Line Options
- 10.5 Reading the User Database
- 10.6 Reading the Group Database
- 10.7 Traversing Arrays of Arrays
- 10.8 Summary
- 10.9 Exercises
- 11 Practical
awk
Programs- 11.1 Running the Example Programs
- 11.2 Reinventing Wheels for Fun and Profit
- 11.3 A Grab Bag of
awk
Programs- 11.3.1 Finding Duplicated Words in a Document
- 11.3.2 An Alarm Clock Program
- 11.3.3 Transliterating Characters
- 11.3.4 Printing Mailing Labels
- 11.3.5 Generating Word-Usage Counts
- 11.3.6 Removing Duplicates from Unsorted Text
- 11.3.7 Extracting Programs from Texinfo Source Files
- 11.3.8 A Simple Stream Editor
- 11.3.9 An Easy Way to Use Library Functions
- 11.3.10 Finding Anagrams from a Dictionary
- 11.3.11 And Now for Something Completely Different
- 11.4 Summary
- 11.5 Exercises
- 10 A Library of
- Part III:Moving Beyond Standard
awk
withgawk
- 12 Advanced Features of
gawk
- 13 Internationalization with
gawk
- 14 Debugging
awk
Programs - 15 Namespaces in
gawk
- 16 Arithmetic and Arbitrary-Precision Arithmetic with
gawk
- 16.1 A General Description of Computer Arithmetic
- 16.2 Other Stuff to Know
- 16.3 Arbitrary-Precision Arithmetic Features in
gawk
- 16.4 Floating-Point Arithmetic: Caveat Emptor!
- 16.5 Arbitrary-Precision Integer Arithmetic with
gawk
- 16.6 How To Check If MPFR Is Available
- 16.7 Standards Versus Existing Practice
- 16.8 Summary
- 17 Writing Extensions for
gawk
- 17.1 Introduction
- 17.2 Extension Licensing
- 17.3 How It Works at a High Level
- 17.4 API Description
- 17.4.1 Introduction
- 17.4.2 General-Purpose Data Types
- 17.4.3 Memory Allocation Functions and Convenience Macros
- 17.4.4 Constructor Functions
- 17.4.5 Registration Functions
- 17.4.6 Printing Messages
- 17.4.7 Updating
ERRNO
- 17.4.8 Requesting Values
- 17.4.9 Accessing and Updating Parameters
- 17.4.10 Symbol Table Access
- 17.4.11 Array Manipulation
- 17.4.12 Accessing and Manipulating Redirections
- 17.4.13 API Variables
- 17.4.14 Boilerplate Code
- 17.4.15 Changes From Version 1 of the API
- 17.5 How
gawk
Finds Extensions - 17.6 Example: Some File Functions
- 17.7 The Sample Extensions in the
gawk
Distribution- 17.7.1 File-Related Functions
- 17.7.2 Interface to
fnmatch()
- 17.7.3 Interface to
fork()
,wait()
, andwaitpid()
- 17.7.4 Enabling In-Place File Editing
- 17.7.5 Character and Numeric values:
ord()
andchr()
- 17.7.6 Reading Directories
- 17.7.7 Reversing Output
- 17.7.8 Two-Way I/O Example
- 17.7.9 Dumping and Restoring an Array
- 17.7.10 Reading an Entire File
- 17.7.11 Extension Time Functions
- 17.7.12 API Tests
- 17.8 The
gawkextlib
Project - 17.9 Summary
- 17.10 Exercises
- 12 Advanced Features of
- Part IV:Appendices
- Appendix A The Evolution of the
awk
Language- A.1 Major Changes Between V7 and SVR3.1
- A.2 Changes Between SVR3.1 and SVR4
- A.3 Changes Between SVR4 and POSIX
awk
- A.4 Extensions in Brian Kernighan’s
awk
- A.5 Extensions in
gawk
Not in POSIXawk
- A.6 History of
gawk
Features - A.7 Common Extensions Summary
- A.8 Regexp Ranges and Locales: A Long Sad Story
- A.9 Major Contributors to
gawk
- A.10 Summary
- Appendix B Installing
gawk
- B.1 The
gawk
Distribution - B.2 Compiling and Installing
gawk
on Unix-Like Systems - B.3 Installation on Other Operating Systems
- B.4 Reporting Problems and Bugs
- B.5 Other Freely Available
awk
Implementations - B.6 Summary
- B.1 The
- Appendix C Implementation Notes
- Appendix D Basic Programming Concepts
- Glossary
- GNU General Public License
- GNU Free Documentation License
- Index
- Appendix A The Evolution of the