Gawk/Input-Summary
From Get docs
Next: Input Exercises, Previous: Command-line directories, Up: Reading Files [Contents][Index]
4.14 Summary
- Input is split into records based on the value of
RS
. The possibilities are as follows:Value of RS
Records are split on … awk
/gawk
Any single character That character awk
The empty string ( ""
)Runs of two or more newlines awk
A regexp Text that matches the regexp gawk
FNR
indicates how many records have been read from the current input file;NR
indicates how many records have been read in total.gawk
setsRT
to the text matched byRS
.- After splitting the input into records,
awk
further splits the records into individual fields, named$1
,$2
, and so on.$0
is the whole record, andNF
indicates how many fields there are. The default way to split fields is between whitespace characters. - Fields may be referenced using a variable, as in
$NF
. Fields may also be assigned values, which causes the value of$0
to be recomputed when it is later referenced. Assigning to a field with a number greater thanNF
creates the field and rebuilds the record, usingOFS
to separate the fields. IncrementingNF
does the same thing. DecrementingNF
throws away fields and rebuilds the record. - Field splitting is more complicated than record splitting:
Field separator value Fields are split … awk
/gawk
FS == " "
On runs of whitespace awk
FS == any single character
On that character awk
FS == regexp
On text matching the regexp awk
FS == ""
Such that each individual character is a separate field gawk
FIELDWIDTHS == list of columns
Based on character position gawk
FPAT == regexp
On the text surrounding text matching the regexp gawk
- Using ‘
FS = "\n"
’ causes the entire record to be a single field (assuming that newlines separate records). FS
may be set from the command line using the-F
option. This can also be done using command-line variable assignment.- Use
PROCINFO["FS"]
to see how fields are being split. - Use
getline
in its various forms to read additional records from the default input stream, from a file, or from a pipe or coprocess. - Use
PROCINFO[file, "READ_TIMEOUT"]
to cause reads to time out forfile
. -
Directories on the command line are fatal for standard
awk
;gawk
ignores them if not in POSIX mode.
Next: Input Exercises, Previous: Command-line directories, Up: Reading Files [Contents][Index]