Input Summary (The GNU Awk User’s Guide)

From Get docs
Gawk/docs/latest/Input-Summary


4.14 Summary

  • Input is split into records based on the value of RS. The possibilities are as follows: Value of RSRecords are split on …awk / gawk Any single characterThat characterawk The empty string ("")Runs of two or more newlinesawk A regexpText that matches the regexpgawk
  • FNR indicates how many records have been read from the current input file; NR indicates how many records have been read in total.
  • gawk sets RT to the text matched by RS.
  • After splitting the input into records, awk further splits the records into individual fields, named $1, $2, and so on. $0 is the whole record, and NF indicates how many fields there are. The default way to split fields is between whitespace characters.
  • Fields may be referenced using a variable, as in $NF. Fields may also be assigned values, which causes the value of $0 to be recomputed when it is later referenced. Assigning to a field with a number greater than NF creates the field and rebuilds the record, using OFS to separate the fields. Incrementing NF does the same thing. Decrementing NF throws away fields and rebuilds the record.
  • Field splitting is more complicated than record splitting: Field separator valueFields are split …awk / gawk FS == " "On runs of whitespaceawk FS == any single characterOn that characterawk FS == regexpOn text matching the regexpawk FS == ""Such that each individual character is a separate fieldgawk FIELDWIDTHS == list of columnsBased on character positiongawk FPAT == regexpOn the text surrounding text matching the regexpgawk
  • Using ‘FS = "\n"’ causes the entire record to be a single field (assuming that newlines separate records).
  • FS may be set from the command line using the -F option. This can also be done using command-line variable assignment.
  • Use PROCINFO["FS"] to see how fields are being split.
  • Use getline in its various forms to read additional records from the default input stream, from a file, or from a pipe or coprocess.
  • Use PROCINFO[file, "READ_TIMEOUT"] to cause reads to time out for file.
  • Directories on the command line are fatal for standard awk; gawk ignores them if not in POSIX mode.