Field Separators (The GNU Awk User’s Guide)

From Get docs
Gawk/docs/latest/Field-Separators


4.5 Specifying How Fields Are Separated

Default Field Splitting    How fields are normally separated.
Regexp Field Splitting    Using regexps as the field separator.
Single Character Fields    Making each character a separate field.
Command Line Field Separator    Setting FS from the command line.
Full Line Fields    Making the full line be a single field.
Field Splitting Summary    Some final points and a summary table.

The field separator, which is either a single character or a regular expression, controls the way awk splits an input record into fields. awk scans the input record for character sequences that match the separator; the fields themselves are the text between the matches.

In the examples that follow, we use the bullet symbol (•) to represent spaces in the output. If the field separator is ‘oo’, then the following line:

moo goo gai pan

is split into three fields: ‘m’, ‘•g’, and ‘•gai•pan’. Note the leading spaces in the values of the second and third fields.

The field separator is represented by the predefined variable FS. Shell programmers take note: awk does not use the name IFS that is used by the POSIX-compliant shells (such as the Unix Bourne shell, sh, or Bash).

The value of FS can be changed in the awk program with the assignment operator, ‘=’ (see section Assignment Expressions). Often, the right time to do this is at the beginning of execution before any input has been processed, so that the very first record is read with the proper separator. To do this, use the special BEGIN pattern (see section The BEGIN and END Special Patterns). For example, here we set the value of FS to the string ",":

awk 'BEGIN { FS = "," } ; { print $2 }'

Given the input line:

John Q. Smith, 29 Oak St., Walamazoo, MI 42139

this awk program extracts and prints the string ‘•29•Oak•St.’.

Sometimes the input data contains separator characters that don’t separate fields the way you thought they would. For instance, the person’s name in the example we just used might have a title or suffix attached, such as:

John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139

The same program would extract ‘•LXIX’ instead of ‘•29•Oak•St.’. If you were expecting the program to print the address, you would be surprised. The moral is to choose your data layout and separator characters carefully to prevent such problems. (If the data is not in a form that is easy to process, perhaps you can massage it first with a separate awk program.)