The GNU Awk User’s Guide

Next: Foreword3, Up: (dir)   [Contents][Index]


General Introduction

This file documents awk, a program that you can use to select particular records in a file and perform operations upon them.

Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2020 Free Software Foundation, Inc.

This is Edition 5.1 of GAWK: Effective AWK Programming: A User’s Guide for GNU Awk, for the 5.1.0 (or later) version of the GNU implementation of AWK.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, with the Front-Cover Texts being “A GNU Manual”, and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License”.

  1. The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual.”

Foreword3

  

Some nice words about this Web page.

Foreword4

  

More nice words.

Preface

  

What this Web page is about; brief history and acknowledgments.

Getting Started

  

A basic introduction to using awk. How to run an awk program. Command-line syntax.

Invoking Gawk

  

How to run gawk.

Regexp

  

All about matching things using regular expressions.

Reading Files

  

How to read files and manipulate fields.

Printing

  

How to print using awk. Describes the print and printf statements. Also describes redirection of output.

Expressions

  

Expressions are the basic building blocks of statements.

Patterns and Actions

  

Overviews of patterns and actions.

Arrays

  

The description and use of arrays. Also includes array-oriented control statements.

Functions

  

Built-in and user-defined functions.

Library Functions

  

A Library of awk Functions.

Sample Programs

  

Many awk programs with complete explanations.

Advanced Features

  

Stuff for advanced users, specific to gawk.

Internationalization

  

Getting gawk to speak your language.

Debugger

  

The gawk debugger.

Namespaces

  

How namespaces work in gawk.

Arbitrary Precision Arithmetic

  

Arbitrary precision arithmetic with gawk.

Dynamic Extensions

  

Adding new built-in functions to gawk.

Language History

  

The evolution of the awk language.

Installation

  

Installing gawk under various operating systems.

Notes

  

Notes about adding things to gawk and possible future work.

Basic Concepts

  

A very quick introduction to programming concepts.

Glossary

  

An explanation of some unfamiliar terms.

Copying

  

Your right to copy and distribute gawk.

GNU Free Documentation License

  

The license for this Web page.

Index

  

Concept and Variable Index.

History

  

The history of gawk and awk.

Names

  

What name to use to find awk.

This Manual

  

Using this Web page. Includes sample input files that you can use.

Conventions

  

Typographical Conventions.

Manual History

  

Brief history of the GNU project and this Web page.

How To Contribute

  

Helping to save the world.

Acknowledgments

  

Acknowledgments.

Running gawk

  

How to run gawk programs; includes command-line syntax.

One-shot

  

Running a short throwaway awk program.

Read Terminal

  

Using no input files (input from the keyboard instead).

Long

  

Putting permanent awk programs in files.

Executable Scripts

  

Making self-contained awk programs.

Comments

  

Adding documentation to gawk programs.

Quoting

  

More discussion of shell quoting issues.

DOS Quoting

  

Quoting in Windows Batch Files.

Sample Data Files

  

Sample data files for use in the awk programs illustrated in this Web page.

Very Simple

  

A very simple example.

Two Rules

  

A less simple one-line example using two rules.

More Complex

  

A more complex example.

Statements/Lines

  

Subdividing or combining statements into lines.

Other Features

  

Other Features of awk.

When

  

When to use gawk and when to use other things.

Intro Summary

  

Summary of the introduction.

Command Line

  

How to run awk.

Options

  

Command-line options and their meanings.

Other Arguments

  

Input file names and variable assignments.

Naming Standard Input

  

How to specify standard input with other files.

Environment Variables

  

The environment variables gawk uses.

AWKPATH Variable

  

Searching directories for awk programs.

AWKLIBPATH Variable

  

Searching directories for awk shared libraries.

Other Environment Variables

  

The environment variables.

Exit Status

  

gawk’s exit status.

Include Files

  

Including other files into your program.

Loading Shared Libraries

  

Loading shared libraries into your program.

Obsolete

  

Obsolete Options and/or features.

Undocumented

  

Undocumented Options and Features.

Invoking Summary

  

Invocation summary.

Regexp Usage

  

How to Use Regular Expressions.

Escape Sequences

  

How to write nonprinting characters.

Regexp Operators

  

Regular Expression Operators.

Regexp Operator Details

  

The actual details.

Interval Expressions

  

Notes on interval expressions.

Bracket Expressions

  

What can go between ‘[...]’.

Leftmost Longest

  

How much text matches.

Computed Regexps

  

Using Dynamic Regexps.

GNU Regexp Operators

  

Operators specific to GNU software.

Case-sensitivity

  

How to do case-insensitive matching.

Regexp Summary

  

Regular expressions summary.

Records

  

Controlling how data is split into records.

awk split records

  

How standard awk splits records.

gawk split records

  

How gawk splits records.

Fields

  

An introduction to fields.

Nonconstant Fields

  

Nonconstant Field Numbers.

Changing Fields

  

Changing the Contents of a Field.

Field Separators

  

The field separator and how to change it.

Default Field Splitting

  

How fields are normally separated.

Regexp Field Splitting

  

Using regexps as the field separator.

Single Character Fields

  

Making each character a separate field.

Command Line Field Separator

  

Setting FS from the command line.

Full Line Fields

  

Making the full line be a single field.

Field Splitting Summary

  

Some final points and a summary table.

Constant Size

  

Reading constant width data.

Fixed width data

  

Processing fixed-width data.

Skipping intervening

  

Skipping intervening fields.

Allowing trailing data

  

Capturing optional trailing data.

Fields with fixed data

  

Field values with fixed-width data.

Splitting By Content

  

Defining Fields By Content

More CSV

  

More on CSV files.

Testing field creation

  

Checking how gawk is splitting records.

Multiple Line

  

Reading multiline records.

Getline

  

Reading files under explicit program control using the getline function.

Plain Getline

  

Using getline with no arguments.

Getline/Variable

  

Using getline into a variable.

Getline/File

  

Using getline from a file.

Getline/Variable/File

  

Using getline into a variable from a file.

Getline/Pipe

  

Using getline from a pipe.

Getline/Variable/Pipe

  

Using getline into a variable from a pipe.

Getline/Coprocess

  

Using getline from a coprocess.

Getline/Variable/Coprocess

  

Using getline into a variable from a coprocess.

Getline Notes

  

Important things to know about getline.

Getline Summary

  

Summary of getline Variants.

Read Timeout

  

Reading input with a timeout.

Retrying Input

  

Retrying input after certain errors.

Command-line directories

  

What happens if you put a directory on the command line.

Input Summary

  

Input summary.

Input Exercises

  

Exercises.

Print

  

The print statement.

Print Examples

  

Simple examples of print statements.

Output Separators

  

The output separators and how to change them.

OFMT

  

Controlling Numeric Output With print.

Printf

  

The printf statement.

Basic Printf

  

Syntax of the printf statement.

Control Letters

  

Format-control letters.

Format Modifiers

  

Format-specification modifiers.

Printf Examples

  

Several examples.

Redirection

  

How to redirect output to multiple files and pipes.

Special FD

  

Special files for I/O.

Special Files

  

File name interpretation in gawk. gawk allows access to inherited file descriptors.

Other Inherited Files

  

Accessing other open files with gawk.

Special Network

  

Special files for network communications.

Special Caveats

  

Things to watch out for.

Close Files And Pipes

  

Closing Input and Output Files and Pipes.

Nonfatal

  

Enabling Nonfatal Output.

Output Summary

  

Output summary.

Output Exercises

  

Exercises.

Values

  

Constants, Variables, and Regular Expressions.

Constants

  

String, numeric and regexp constants.

Scalar Constants

  

Numeric and string constants.

Nondecimal-numbers

  

What are octal and hex numbers.

Regexp Constants

  

Regular Expression constants.

Using Constant Regexps

  

When and how to use a regexp constant.

Standard Regexp Constants

  

Regexp constants in standard awk.

Strong Regexp Constants

  

Strongly typed regexp constants.

Variables

  

Variables give names to values for later use.

Using Variables

  

Using variables in your programs.

Assignment Options

  

Setting variables on the command line and a summary of command-line syntax. This is an advanced method of input.

Conversion

  

The conversion of strings to numbers and vice versa.

Strings And Numbers

  

How awk Converts Between Strings And Numbers.

Locale influences conversions

  

How the locale may affect conversions.

All Operators

  

gawk’s operators.

Arithmetic Ops

  

Arithmetic operations (‘+’, ‘-’, etc.)

Concatenation

  

Concatenating strings.

Assignment Ops

  

Changing the value of a variable or a field.

Increment Ops

  

Incrementing the numeric value of a variable.

Truth Values and Conditions

  

Testing for true and false.

Truth Values

  

What is “true” and what is “false”.

Typing and Comparison

  

How variables acquire types and how this affects comparison of numbers and strings with ‘<’, etc.

Variable Typing

  

String type versus numeric type.

Comparison Operators

  

The comparison operators.

POSIX String Comparison

  

String comparison with POSIX rules.

Boolean Ops

  

Combining comparison expressions using boolean operators ‘||’ (“or”), ‘&&’ (“and”) and ‘!’ (“not”).

Conditional Exp

  

Conditional expressions select between two subexpressions under control of a third subexpression.

Function Calls

  

A function call is an expression.

Precedence

  

How various operators nest.

Locales

  

How the locale affects things.

Expressions Summary

  

Expressions summary.

Pattern Overview

  

What goes into a pattern.

Regexp Patterns

  

Using regexps as patterns.

Expression Patterns

  

Any expression can be used as a pattern.

Ranges

  

Pairs of patterns specify record ranges.

BEGIN/END

  

Specifying initialization and cleanup rules.

Using BEGIN/END

  

How and why to use BEGIN/END rules.

I/O And BEGIN/END

  

I/O issues in BEGIN/END rules.

BEGINFILE/ENDFILE

  

Two special patterns for advanced control.

Empty

  

The empty pattern, which matches every record.

Using Shell Variables

  

How to use shell variables with awk.

Action Overview

  

What goes into an action.

Statements

  

Describes the various control statements in detail.

If Statement

  

Conditionally execute some awk statements.

While Statement

  

Loop until some condition is satisfied.

Do Statement

  

Do specified action while looping until some condition is satisfied.

For Statement

  

Another looping statement, that provides initialization and increment clauses.

Switch Statement

  

Switch/case evaluation for conditional execution of statements based on a value.

Break Statement

  

Immediately exit the innermost enclosing loop.

Continue Statement

  

Skip to the end of the innermost enclosing loop.

Next Statement

  

Stop processing the current input record.

Nextfile Statement

  

Stop processing the current file.

Exit Statement

  

Stop execution of awk.

Built-in Variables

  

Summarizes the predefined variables.

User-modified

  

Built-in variables that you change to control awk.

Auto-set

  

Built-in variables where awk gives you information.

ARGC and ARGV

  

Ways to use ARGC and ARGV.

Pattern Action Summary

  

Patterns and Actions summary.

Array Basics

  

The basics of arrays.

Array Intro

  

Introduction to Arrays

Reference to Elements

  

How to examine one element of an array.

Assigning Elements

  

How to change an element of an array.

Array Example

  

Basic Example of an Array

Scanning an Array

  

A variation of the for statement. It loops through the indices of an array’s existing elements.

Controlling Scanning

  

Controlling the order in which arrays are scanned.

Numeric Array Subscripts

  

How to use numbers as subscripts in awk.

Uninitialized Subscripts

  

Using Uninitialized variables as subscripts.

Delete

  

The delete statement removes an element from an array.

Multidimensional

  

Emulating multidimensional arrays in awk.

Multiscanning

  

Scanning multidimensional arrays.

Arrays of Arrays

  

True multidimensional arrays.

Arrays Summary

  

Summary of arrays.

Built-in

  

Summarizes the built-in functions.

Calling Built-in

  

How to call built-in functions.

Numeric Functions

  

Functions that work with numbers, including int(), sin() and rand().

String Functions

  

Functions for string manipulation, such as split(), match() and sprintf().

Gory Details

  

More than you want to know about ‘\’ and ‘&’ with sub(), gsub(), and gensub().

I/O Functions

  

Functions for files and shell commands.

Time Functions

  

Functions for dealing with timestamps.

Bitwise Functions

  

Functions for bitwise operations.

Type Functions

  

Functions for type information.

I18N Functions

  

Functions for string translation.

User-defined

  

Describes User-defined functions in detail.

Definition Syntax

  

How to write definitions and what they mean.

Function Example

  

An example function definition and what it does.

Function Calling

  

Calling user-defined functions.

Calling A Function

  

Don’t use spaces.

Variable Scope

  

Controlling variable scope.

Pass By Value/Reference

  

Passing parameters.

Function Caveats

  

Other points to know about functions.

Return Statement

  

Specifying the value a function returns.

Dynamic Typing

  

How variable types can change at runtime.

Indirect Calls

  

Choosing the function to call at runtime.

Functions Summary

  

Summary of functions.

Library Names

  

How to best name private global variables in library functions.

General Functions

  

Functions that are of general use.

Strtonum Function

  

A replacement for the built-in strtonum() function.

Assert Function

  

A function for assertions in awk programs.

Round Function

  

A function for rounding if sprintf() does not do it correctly.

Cliff Random Function

  

The Cliff Random Number Generator.

Ordinal Functions

  

Functions for using characters as numbers and vice versa.

Join Function

  

A function to join an array into a string.

Getlocaltime Function

  

A function to get formatted times.

Readfile Function

  

A function to read an entire file at once.

Shell Quoting

  

A function to quote strings for the shell.

Data File Management

  

Functions for managing command-line data files.

Filetrans Function

  

A function for handling data file transitions.

Rewind Function

  

A function for rereading the current file.

File Checking

  

Checking that data files are readable.

Empty Files

  

Checking for zero-length files.

Ignoring Assigns

  

Treating assignments as file names.

Getopt Function

  

A function for processing command-line arguments.

Passwd Functions

  

Functions for getting user information.

Group Functions

  

Functions for getting group information.

Walking Arrays

  

A function to walk arrays of arrays.

Library Functions Summary

  

Summary of library functions.

Library Exercises

  

Exercises.

Running Examples

  

How to run these examples.

Clones

  

Clones of common utilities.

Cut Program

  

The cut utility.

Egrep Program

  

The egrep utility.

Id Program

  

The id utility.

Split Program

  

The split utility.

Tee Program

  

The tee utility.

Uniq Program

  

The uniq utility.

Wc Program

  

The wc utility.

Miscellaneous Programs

  

Some interesting awk programs.

Dupword Program

  

Finding duplicated words in a document.

Alarm Program

  

An alarm clock.

Translate Program

  

A program similar to the tr utility.

Labels Program

  

Printing mailing labels.

Word Sorting

  

A program to produce a word usage count.

History Sorting

  

Eliminating duplicate entries from a history file.

Extract Program

  

Pulling out programs from Texinfo source files.

Simple Sed

  

A Simple Stream Editor.

Igawk Program

  

A wrapper for awk that includes files.

Anagram Program

  

Finding anagrams from a dictionary.

Signature Program

  

People do amazing things with too much time on their hands.

Programs Summary

  

Summary of programs.

Programs Exercises

  

Exercises.

Nondecimal Data

  

Allowing nondecimal input data.

Array Sorting

  

Facilities for controlling array traversal and sorting arrays.

Controlling Array Traversal

  

How to use PROCINFO["sorted_in"].

Array Sorting Functions

  

How to use asort() and asorti().

Two-way I/O

  

Two-way communications with another process.

TCP/IP Networking

  

Using gawk for network programming.

Profiling

  

Profiling your awk programs.

Advanced Features Summary

  

Summary of advanced features.

I18N and L10N

  

Internationalization and Localization.

Explaining gettext

  

How GNU gettext works.

Programmer i18n

  

Features for the programmer.

Translator i18n

  

Features for the translator.

String Extraction

  

Extracting marked strings.

Printf Ordering

  

Rearranging printf arguments.

I18N Portability

  

awk-level portability issues.

I18N Example

  

A simple i18n example.

Gawk I18N

  

gawk is also internationalized.

I18N Summary

  

Summary of I18N stuff.

Debugging

  

Introduction to gawk debugger.

Debugging Concepts

  

Debugging in General.

Debugging Terms

  

Additional Debugging Concepts.

Awk Debugging

  

Awk Debugging.

Sample Debugging Session

  

Sample debugging session.

Debugger Invocation

  

How to Start the Debugger.

Finding The Bug

  

Finding the Bug.

List of Debugger Commands

  

Main debugger commands.

Breakpoint Control

  

Control of Breakpoints.

Debugger Execution Control

  

Control of Execution.

Viewing And Changing Data

  

Viewing and Changing Data.

Execution Stack

  

Dealing with the Stack.

Debugger Info

  

Obtaining Information about the Program and the Debugger State.

Miscellaneous Debugger Commands

  

Miscellaneous Commands.

Readline Support

  

Readline support.

Limitations

  

Limitations and future plans.

Debugging Summary

  

Debugging summary.

Global Namespace

  

The global namespace in standard awk.

Qualified Names

  

How to qualify names with a namespace.

Default Namespace

  

The default namespace.

Changing The Namespace

  

How to change the namespace.

Naming Rules

  

Namespace and Component Naming Rules.

Internal Name Management

  

How names are stored internally.

Namespace Example

  

An example of code using a namespace.

Namespace And Features

  

Namespaces and other gawk features.

Namespace Summary

  

Summarizing namespaces.

Computer Arithmetic

  

A quick intro to computer math.

Math Definitions

  

Defining terms used.

MPFR features

  

The MPFR features in gawk.

FP Math Caution

  

Things to know.

Inexactness of computations

  

Floating point math is not exact.

Inexact representation

  

Numbers are not exactly represented.

Comparing FP Values

  

How to compare floating point values.

Errors accumulate

  

Errors get bigger as they go.

Getting Accuracy

  

Getting more accuracy takes some work.

Try To Round

  

Add digits and round.

Setting precision

  

How to set the precision.

Setting the rounding mode

  

How to set the rounding mode.

Arbitrary Precision Integers

  

Arbitrary Precision Integer Arithmetic with gawk.

Checking for MPFR

  

How to check if MPFR is available.

POSIX Floating Point Problems

  

Standards Versus Existing Practice.

Floating point summary

  

Summary of floating point discussion.

Extension Intro

  

What is an extension.

Plugin License

  

A note about licensing.

Extension Mechanism Outline

  

An outline of how it works.

Extension API Description

  

A full description of the API.

Extension API Functions Introduction

  

Introduction to the API functions.

General Data Types

  

The data types.

Memory Allocation Functions

  

Functions for allocating memory.

Constructor Functions

  

Functions for creating values.

Registration Functions

  

Functions to register things with gawk.

Extension Functions

  

Registering extension functions.

Exit Callback Functions

  

Registering an exit callback.

Extension Version String

  

Registering a version string.

Input Parsers

  

Registering an input parser.

Output Wrappers

  

Registering an output wrapper.

Two-way processors

  

Registering a two-way processor.

Printing Messages

  

Functions for printing messages.

Updating ERRNO

  

Functions for updating ERRNO.

Requesting Values

  

How to get a value.

Accessing Parameters

  

Functions for accessing parameters.

Symbol Table Access

  

Functions for accessing global variables.

Symbol table by name

  

Accessing variables by name.

Symbol table by cookie

  

Accessing variables by “cookie”.

Cached values

  

Creating and using cached values.

Array Manipulation

  

Functions for working with arrays.

Array Data Types

  

Data types for working with arrays.

Array Functions

  

Functions for working with arrays.

Flattening Arrays

  

How to flatten arrays.

Creating Arrays

  

How to create and populate arrays.

Redirection API

  

How to access and manipulate redirections.

Extension API Variables

  

Variables provided by the API.

Extension Versioning

  

API Version information.

Extension GMP/MPFR Versioning

  

Version information about GMP and MPFR.

Extension API Informational Variables

  

Variables providing information about gawk’s invocation.

Extension API Boilerplate

  

Boilerplate code for using the API.

Changes from API V1

  

Changes from V1 of the API.

Finding Extensions

  

How gawk finds compiled extensions.

Extension Example

  

Example C code for an extension.

Internal File Description

  

What the new functions will do.

Internal File Ops

  

The code for internal file operations.

Using Internal File Ops

  

How to use an external extension.

Extension Samples

  

The sample extensions that ship with gawk.

Extension Sample File Functions

  

The file functions sample.

Extension Sample Fnmatch

  

An interface to fnmatch().

Extension Sample Fork

  

An interface to fork() and other process functions.

Extension Sample Inplace

  

Enabling in-place file editing.

Extension Sample Ord

  

Character to value to character conversions.

Extension Sample Readdir

  

An interface to readdir().

Extension Sample Revout

  

Reversing output sample output wrapper.

Extension Sample Rev2way

  

Reversing data sample two-way processor.

Extension Sample Read write array

  

Serializing an array to a file.

Extension Sample Readfile

  

Reading an entire file into a string.

Extension Sample Time

  

An interface to gettimeofday() and sleep().

Extension Sample API Tests

  

Tests for the API.

gawkextlib

  

The gawkextlib project.

Extension summary

  

Extension summary.

Extension Exercises

  

Exercises.

V7/SVR3.1

  

The major changes between V7 and System V Release 3.1.

SVR4

  

Minor changes between System V Releases 3.1 and 4.

POSIX

  

New features from the POSIX standard.

BTL

  

New features from Brian Kernighan’s version of awk.

POSIX/GNU

  

The extensions in gawk not in POSIX awk.

Feature History

  

The history of the features in gawk.

Common Extensions

  

Common Extensions Summary.

Ranges and Locales

  

How locales used to affect regexp ranges.

Contributors

  

The major contributors to gawk.

History summary

  

History summary.

Gawk Distribution

  

What is in the gawk distribution.

Getting

  

How to get the distribution.

Extracting

  

How to extract the distribution.

Distribution contents

  

What is in the distribution.

Unix Installation

  

Installing gawk under various versions of Unix.

Quick Installation

  

Compiling gawk under Unix.

Shell Startup Files

  

Shell convenience functions.

Additional Configuration Options

  

Other compile-time options.

Configuration Philosophy

  

How it’s all supposed to work.

Non-Unix Installation

  

Installation on Other Operating Systems.

PC Installation

  

Installing and Compiling gawk on Microsoft Windows.

PC Binary Installation

  

Installing a prepared distribution.

PC Compiling

  

Compiling gawk for Windows32.

PC Using

  

Running gawk on Windows32.

Cygwin

  

Building and running gawk for Cygwin.

MSYS

  

Using gawk In The MSYS Environment.

VMS Installation

  

Installing gawk on VMS.

VMS Compilation

  

How to compile gawk under VMS.

VMS Dynamic Extensions

  

Compiling gawk dynamic extensions on VMS.

VMS Installation Details

  

How to install gawk under VMS.

VMS Running

  

How to run gawk under VMS.

VMS GNV

  

The VMS GNV Project.

VMS Old Gawk

  

An old version comes with some VMS systems.

Bugs

  

Reporting Problems and Bugs.

Bug address

  

Where to send reports to.

Usenet

  

Where not to send reports to.

Maintainers

  

Maintainers of non-*nix ports.

Other Versions

  

Other freely available awk implementations.

Installation summary

  

Summary of installation.

Compatibility Mode

  

How to disable certain gawk extensions.

Additions

  

Making Additions To gawk.

Accessing The Source

  

Accessing the Git repository.

Adding Code

  

Adding code to the main body of gawk.

New Ports

  

Porting gawk to a new operating system.

Derived Files

  

Why derived files are kept in the Git repository.

Future Extensions

  

New features that may be implemented one day.

Implementation Limitations

  

Some limitations of the implementation.

Extension Design

  

Design notes about the extension API.

Old Extension Problems

  

Problems with the old mechanism.

Extension New Mechanism Goals

  

Goals for the new mechanism.

Extension Other Design Decisions

  

Some other design decisions.

Extension Future Growth

  

Some room for future growth.

Notes summary

  

Summary of implementation notes.

Basic High Level

  

The high level view.

Basic Data Typing

  

A very quick intro to data types.

Short Table of Contents

Table of Contents


Next: Foreword3, Up: (dir)   [Contents][Index]