Tar/sparse

From Get docs

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.1.2 Archiving Sparse Files

Files in the file system occasionally have holes. A hole in a file is a section of the file's contents which was never written. The contents of a hole reads as all zeros. On many operating systems, actual disk storage is not allocated for holes, but they are counted in the length of the file. If you archive such a file, tar could create an archive longer than the original. To have tar attempt to recognize the holes in a file, use `--sparse' (`-S'). When you use this option, then, for any file using less disk space than would be expected from its length, tar searches the file for holes. It then records in the archive for the file where the holes (consecutive stretches of zeros) are, and only archives the "real contents" of the file. On extraction (using `--sparse' is not needed on extraction) any such files have also holes created wherever the holes were found. Thus, if you use `--sparse', tar archives won't take more space than the original.

GNU tar uses two methods for detecting holes in sparse files. These methods are described later in this subsection.

`-S'

`--sparse'

This option instructs tar to test each file for sparseness before attempting to archive it. If the file is found to be sparse it is treated specially, thus allowing to decrease the amount of space used by its image in the archive.

This option is meaningful only when creating or updating archives. It has no effect on extraction.

Consider using `--sparse' when performing file system backups, to avoid archiving the expanded forms of files stored sparsely in the system.

Even if your system has no sparse files currently, some may be created in the future. If you use `--sparse' while making file system backups as a matter of course, you can be assured the archive will never take more space on the media than the files take on disk (otherwise, archiving a disk filled with sparse files might take hundreds of tapes). See section Using tar to Perform Incremental Dumps.

However, be aware that `--sparse' option may present a serious drawback. Namely, in order to determine the positions of holes in a file tar may have to read it before trying to archive it, so in total the file may be read twice. This may happen when your OS or your FS does not support SEEK_HOLE/SEEK_DATA feature in lseek (See `--hole-detection', below).

When using `POSIX' archive format, GNU tar is able to store sparse files using in three distinct ways, called sparse formats. A sparse format is identified by its number, consisting, as usual of two decimal numbers, delimited by a dot. By default, format `1.0' is used. If, for some reason, you wish to use an earlier format, you can select it using `--sparse-version' option.

`--sparse-version=version'

Select the format to store sparse files in. Valid version values are: `0.0', `0.1' and `1.0'. See section Storing Sparse Files, for a detailed description of each format.

Using `--sparse-format' option implies `--sparse'.

`--hole-detection=method'

Enforce concrete hole detection method. Before the real contents of sparse file are stored, tar needs to gather knowledge about file sparseness. This is because it needs to have the file's map of holes stored into tar header before it starts archiving the file contents. Currently, two methods of hole detection are implemented:

  • `--hole-detection=seek' Seeking the file for data and holes. It uses enhancement of the lseek system call (SEEK_HOLE and SEEK_DATA) which is able to reuse file system knowledge about sparse file contents - so the detection is usually very fast. To use this feature, your file system and operating system must support it. At the time of this writing (2015) this feature, in spite of not being accepted by POSIX, is fairly widely supported by different operating systems.
  • `--hole-detection=raw' Reading byte-by-byte the whole sparse file before the archiving. This method detects holes like consecutive stretches of zeroes. Comparing to the previous method, it is usually much slower, although more portable.

When no `--hole-detection' option is given, tar uses the `seek', if supported by the operating system.

Using `--hole-detection' option implies `--sparse'.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on February, 23 2019 using texi2html 1.76.