Next: csplit invocation, Previous: tail invocation, Up: Output of parts of files [Contents][Index]
split
: Split a file into pieces.
split
creates output files containing consecutive or interleaved
sections of input
(standard input if none is given or input
is ‘-
’). Synopsis:
split [option] [input [prefix]]
By default, split
puts 1000 lines of input
(or whatever is
left over for the last section), into each output file.
The output files’ names consist of prefix
(‘x
’ by default)
followed by a group of characters (‘aa
’, ‘ab
’, … by
default), such that concatenating the output files in traditional
sorted order by file name produces the original input file (except
-nr/n
). By default split will initially create files
with two generated suffix characters, and will increase this width by two
when the next most significant position reaches the last character.
(‘yz
’, ‘zaaa
’, ‘zaab
’, …). In this way an arbitrary
number of output files are supported, which sort as described above,
even in the presence of an --additional-suffix
option.
If the -a
option is specified and the output file names are
exhausted, split
reports an error without deleting the
output files that it did create.
The program accepts the following options. Also see Common options.
-l lines
’--lines=lines
’
Put lines
lines of input
into each output file.
If --separator
is specified, then lines
determines
the number of records.
For compatibility split
also supports an obsolete
option syntax -lines
. New scripts should use
-l lines
instead.
-b size
’--bytes=size
’
Put size
bytes of input
into each output file.
size
may be, or may be an integer optionally followed by,
one of the following multiplicative suffixes:
‘b’ => 512 ("blocks") ‘KB’ => 1000 (KiloBytes) ‘K’ => 1024 (KibiBytes) ‘MB’ => 1000*1000 (MegaBytes) ‘M’ => 1024*1024 (MebiBytes) ‘GB’ => 1000*1000*1000 (GigaBytes) ‘G’ => 1024*1024*1024 (GibiBytes)
and so on for ‘T
’, ‘P
’, ‘E
’, ‘Z
’, and ‘Y
’.
Binary prefixes can be used, too: ‘KiB
’=‘K
’, ‘MiB
’=‘M
’,
and so on.
-C size
’--line-bytes=size
’
Put into each output file as many complete lines of input
as
possible without exceeding size
bytes. Individual lines or records
longer than size
bytes are broken into multiple files.
size
has the same format as for the --bytes
option.
If --separator
is specified, then lines
determines
the number of records.
--filter=command
’
With this option, rather than simply writing to each output file,
write through a pipe to the specified shell command
for each output file.
command
should use the $FILE environment variable, which is set
to a different output file name for each invocation of the command.
For example, imagine that you have a 1TiB compressed file
that, if uncompressed, would be too large to reside on disk,
yet you must split it into individually-compressed pieces
of a more manageable size.
To do that, you might run this command:
xz -dc BIG.xz | split -b200G --filter='xz > $FILE.xz' - big-
Assuming a 10:1 compression ratio, that would create about fifty 20GiB files
with names big-aa.xz
, big-ab.xz
, big-ac.xz
, etc.
-n chunks
’--number=chunks
’
Split input
to chunks
output files where chunks
may be:
n generate n files based on current size of input k/n only output kth of n to stdout l/n generate n files without splitting lines or records l/k/n likewise but only output kth of n to stdout r/n like ‘l’ but use round robin distribution r/k/n likewise but only output kth of n to stdout
Any excess bytes remaining after dividing the input
into n
chunks, are assigned to the last chunk.
Any excess bytes appearing after the initial calculation are discarded
(except when using ‘r
’ mode).
All n
files are created even if there are fewer than n
lines,
or the input
is truncated.
For ‘l
’ mode, chunks are approximately input
size / n
.
The input
is partitioned into n
equal sized portions, with
the last assigned any excess. If a line starts within a partition
it is written completely to the corresponding file. Since lines or records
are not split even if they overlap a partition, the files written
can be larger or smaller than the partition size, and even empty
if a line/record is so long as to completely overlap the partition.
For ‘r
’ mode, the size of input
is irrelevant,
and so can be a pipe for example.
-a length
’--suffix-length=length
’
Use suffixes of length length
. If a length
of 0 is specified,
this is the same as if (any previous) -a
was not specified, and
thus enables the default behavior, which starts the suffix length at 2,
and unless -n
or --numeric-suffixes=from
is
specified, will auto increase the length by 2 as required.
-d
’--numeric-suffixes[=from]
’
Use digits in suffixes rather than lower-case letters. The numerical
suffix counts from from
if specified, 0 otherwise.
from
is supported with the long form option, and is used to either set the
initial suffix for a single run, or to set the suffix offset for independently
split inputs, and consequently the auto suffix length expansion described above
is disabled. Therefore you may also want to use option -a
to allow
suffixes beyond ‘99
’. Note if option --number
is specified and
the number of files is less than from
, a single run is assumed and the
minimum suffix length required is automatically determined.
-x
’--hex-suffixes[=from]
’
Like --numeric-suffixes
, but use hexadecimal numbers (in lower case).
--additional-suffix=suffix
’
Append an additional suffix
to output file names. suffix
must not contain slash.
-e
’--elide-empty-files
’
Suppress the generation of zero-length output files. This can happen
with the --number
option if a file is (truncated to be) shorter
than the number requested, or if a line is so long as to completely
span a chunk. The output file sequence numbers, always run consecutively
even when this option is specified.
-t separator
’--separator=separator
’
Use character separator
as the record separator instead of the default
newline character (ASCII LF).
To specify ASCII NUL as the separator, use the two-character string ‘\0
’,
e.g., ‘split -t '\0'
’.
-u
’--unbuffered
’
Immediately copy input to output in --number r/…
mode,
which is a much slower mode of operation.
--verbose
’
Write a diagnostic just before each output file is opened.
An exit status of zero indicates success, and a nonzero value indicates failure.
Here are a few examples to illustrate how the
--number
(-n
) option works:
Notice how, by default, one line may be split onto two or more:
$ seq -w 6 10 > k; split -n3 k; head xa? ==> xaa <== 06 07 ==> xab <== 08 0 ==> xac <== 9 10
Use the "l/" modifier to suppress that:
$ seq -w 6 10 > k; split -nl/3 k; head xa? ==> xaa <== 06 07 ==> xab <== 08 09 ==> xac <== 10
Use the "r/" modifier to distribute lines in a round-robin fashion:
$ seq -w 6 10 > k; split -nr/3 k; head xa? ==> xaa <== 06 09 ==> xab <== 07 10 ==> xac <== 08
You can also extract just the Kth chunk. This extracts and prints just the 7th "chunk" of 33:
$ seq 100 > k; split -nl/7/33 k 20 21 22
Next: csplit invocation, Previous: tail invocation, Up: Output of parts of files [Contents][Index]