Gawk/Input-Summary

From Get docs

4.14 Summary

  • Input is split into records based on the value of RS. The possibilities are as follows:
    Value of RS Records are split on … awk / gawk
    Any single character That character awk
    The empty string ("") Runs of two or more newlines awk
    A regexp Text that matches the regexp gawk
  • FNR indicates how many records have been read from the current input file; NR indicates how many records have been read in total.
  • gawk sets RT to the text matched by RS.
  • After splitting the input into records, awk further splits the records into individual fields, named $1, $2, and so on. $0 is the whole record, and NF indicates how many fields there are. The default way to split fields is between whitespace characters.
  • Fields may be referenced using a variable, as in $NF. Fields may also be assigned values, which causes the value of $0 to be recomputed when it is later referenced. Assigning to a field with a number greater than NF creates the field and rebuilds the record, using OFS to separate the fields. Incrementing NF does the same thing. Decrementing NF throws away fields and rebuilds the record.
  • Field splitting is more complicated than record splitting:
    Field separator value Fields are split … awk / gawk
    FS == " " On runs of whitespace awk
    FS == any single character On that character awk
    FS == regexp On text matching the regexp awk
    FS == "" Such that each individual character is a separate field gawk
    FIELDWIDTHS == list of columns Based on character position gawk
    FPAT == regexp On the text surrounding text matching the regexp gawk
  • Using ‘FS = "\n"’ causes the entire record to be a single field (assuming that newlines separate records).
  • FS may be set from the command line using the -F option. This can also be done using command-line variable assignment.
  • Use PROCINFO["FS"] to see how fields are being split.
  • Use getline in its various forms to read additional records from the default input stream, from a file, or from a pipe or coprocess.
  • Use PROCINFO[file, "READ_TIMEOUT"] to cause reads to time out for file.
  • Directories on the command line are fatal for standard awk; gawk ignores them if not in POSIX mode.