split invocation (GNU Coreutils 9.0)

From Get docs
Coreutils/docs/latest/split-invocation


5.3 split: Split a file into pieces.

split creates output files containing consecutive or interleaved sections of input (standard input if none is given or input is ‘-’). Synopsis:

split [option] [input [prefix]]

By default, split puts 1000 lines of input (or whatever is left over for the last section), into each output file.

The output files’ names consist of prefix (‘x’ by default) followed by a group of characters (‘aa’, ‘ab’, … by default), such that concatenating the output files in traditional sorted order by file name produces the original input file (except -nr/n). By default split will initially create files with two generated suffix characters, and will increase this width by two when the next most significant position reaches the last character. (‘yz’, ‘zaaa’, ‘zaab’, …). In this way an arbitrary number of output files are supported, which sort as described above, even in the presence of an --additional-suffix option. If the -a option is specified and the output file names are exhausted, split reports an error without deleting the output files that it did create.

The program accepts the following options. Also see Common options.

-l lines
--lines=lines

Put lines lines of input into each output file. If --separator is specified, then lines determines the number of records.

For compatibility split also supports an obsolete option syntax -lines. New scripts should use -l lines instead.

-b size
--bytes=size

Put size bytes of input into each output file. size may be, or may be an integer optionally followed by, one of the following multiplicative suffixes:

‘b’  =>            512 ("blocks")
‘KB’ =>           1000 (KiloBytes)
‘K’  =>           1024 (KibiBytes)
‘MB’ =>      1000*1000 (MegaBytes)
‘M’  =>      1024*1024 (MebiBytes)
‘GB’ => 1000*1000*1000 (GigaBytes)
‘G’  => 1024*1024*1024 (GibiBytes)

and so on for ‘T’, ‘P’, ‘E’, ‘Z’, and ‘Y’. Binary prefixes can be used, too: ‘KiB’=‘K’, ‘MiB’=‘M’, and so on.

-C size
--line-bytes=size

Put into each output file as many complete lines of input as possible without exceeding size bytes. Individual lines or records longer than size bytes are broken into multiple files. size has the same format as for the --bytes option. If --separator is specified, then lines determines the number of records.

--filter=command

With this option, rather than simply writing to each output file, write through a pipe to the specified shell command for each output file. command should use the $FILE environment variable, which is set to a different output file name for each invocation of the command. For example, imagine that you have a 1TiB compressed file that, if uncompressed, would be too large to reside on secondary storage, yet you must split it into individually-compressed pieces of a more manageable size. To do that, you might run this command:

xz -dc BIG.xz | split -b200G --filter='xz > $FILE.xz' - big-

Assuming a 10:1 compression ratio, that would create about fifty 20GiB files with names big-aa.xz, big-ab.xz, big-ac.xz, etc.

-n chunks
--number=chunks

Split input to chunks output files where chunks may be:

n      generate n files based on current size of input
k/n    output only kth of n to standard output
l/n    generate n files without splitting lines or records
l/k/n  likewise but output only kth of n to stdout
r/n    like ‘l’ but use round robin distribution
r/k/n  likewise but output only kth of n to stdout

Any excess bytes remaining after dividing the input into n chunks, are assigned to the last chunk. Any excess bytes appearing after the initial calculation are discarded (except when using ‘r’ mode).

All n files are created even if there are fewer than n lines, or the input is truncated.

For ‘l’ mode, chunks are approximately input size / n. The input is partitioned into n equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition.

For ‘r’ mode, the size of input is irrelevant, and so can be a pipe for example.

-a length
--suffix-length=length

Use suffixes of length length. If a length of 0 is specified, this is the same as if (any previous) -a was not specified, and thus enables the default behavior, which starts the suffix length at 2, and unless -n or --numeric-suffixes=from is specified, will auto increase the length by 2 as required.

-d
--numeric-suffixes[=from]

Use digits in suffixes rather than lower-case letters. The numerical suffix counts from from if specified, 0 otherwise.

from is supported with the long form option, and is used to either set the initial suffix for a single run, or to set the suffix offset for independently split inputs, and consequently the auto suffix length expansion described above is disabled. Therefore you may also want to use option -a to allow suffixes beyond ‘99’. Note if option --number is specified and the number of files is less than from, a single run is assumed and the minimum suffix length required is automatically determined.

-x
--hex-suffixes[=from]

Like --numeric-suffixes, but use hexadecimal numbers (in lower case).

--additional-suffix=suffix

Append an additional suffix to output file names. suffix must not contain slash.

-e
--elide-empty-files

Suppress the generation of zero-length output files. This can happen with the --number option if a file is (truncated to be) shorter than the number requested, or if a line is so long as to completely span a chunk. The output file sequence numbers, always run consecutively even when this option is specified.

-t separator
--separator=separator

Use character separator as the record separator instead of the default newline character (ASCII LF). To specify ASCII NUL as the separator, use the two-character string ‘\0’, e.g., ‘split -t '\0'’.

-u
--unbuffered

Immediately copy input to output in --number r/… mode, which is a much slower mode of operation.

--verbose

Write a diagnostic just before each output file is opened.

An exit status of zero indicates success, and a nonzero value indicates failure.

Here are a few examples to illustrate how the --number (-n) option works:

Notice how, by default, one line may be split onto two or more:

$ seq -w 6 10 > k; split -n3 k; head xa?
==> xaa <==
06
07
==> xab <==

08
0
==> xac <==
9
10

Use the "l/" modifier to suppress that:

$ seq -w 6 10 > k; split -nl/3 k; head xa?
==> xaa <==
06
07

==> xab <==
08
09

==> xac <==
10

Use the "r/" modifier to distribute lines in a round-robin fashion:

$ seq -w 6 10 > k; split -nr/3 k; head xa?
==> xaa <==
06
09

==> xab <==
07
10

==> xac <==
08

You can also extract just the Kth chunk. This extracts and prints just the 7th "chunk" of 33:

$ seq 100 > k; split -nl/7/33 k
20
21
22