Recursive Accept/Reject Options (GNU Wget 1.21.1-dirty Manual)
Next: Exit Status, Previous: Recursive Retrieval Options, Up: Invoking [Contents][Index]
2.12 Recursive Accept/Reject Options
- ‘
-A acclist --accept acclist
’
‘-R rejlist --reject rejlist
’ Specify comma-separated lists of file name suffixes or patterns to accept or reject (see Types of Files). Note that if any of the wildcard characters, ‘
*
’, ‘?
’, ‘[
’ or ‘]
’, appear in an element ofacclist
orrejlist
, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A "*.mp3"
’ or ‘-A '*.mp3'
’.- ‘
--accept-regex urlregex
’
‘--reject-regex urlregex
’ Specify a regular expression to accept or reject the complete URL.
- ‘
--regex-type regextype
’ Specify the regular expression type. Possible types are ‘
posix
’ or ‘pcre
’. Note that to be able to use ‘pcre
’ type, wget has to be compiled with libpcre support.- ‘
-D domain-list
’
‘--domains=domain-list
’ Set domains to be followed.
domain-list
is a comma-separated list of domains. Note that it does not turn on ‘-H
’.- ‘
--exclude-domains domain-list
’ Specify the domains that are not to be followed (see Spanning Hosts).
- ‘
--follow-ftp
’ Follow FTP links from HTML documents. Without this option, Wget will ignore all the FTP links.
- ‘
--follow-tags=list
’ Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated
list
with this option.- ‘
--ignore-tags=list
’ This is the opposite of the ‘
--follow-tags
’ option. To skip certain HTML tags when recursively looking for documents to download, specify them in a comma-separatedlist
.In the past, this option was the best bet for downloading a single page and its requisites, using a command-line like:
wget --ignore-tags=a,area -H -k -K -r http://site/document
However, the author of this option came across a page with tags like
<LINK REL="home" HREF="/">
and came to the realization that specifying tags to ignore was not enough. One can’t just tell Wget to ignore<LINK>
, because then stylesheets will not be downloaded. Now the best bet for downloading a single page and its requisites is the dedicated ‘--page-requisites
’ option.- ‘
--ignore-case
’ Ignore case when matching files and directories. This influences the behavior of -R, -A, -I, and -X options, as well as globbing implemented when downloading from FTP sites. For example, with this option, ‘
-A "*.txt"
’ will match ‘file1.txt
’, but also ‘file2.TXT
’, ‘file3.TxT
’, and so on. The quotes in the example are to prevent the shell from expanding the pattern.- ‘
-H
’
‘--span-hosts
’ Enable spanning across hosts when doing recursive retrieving (see Spanning Hosts).
- ‘
-L
’
‘--relative
’ Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts (see Relative Links).
- ‘
-I list
’
‘--include-directories=list
’ Specify a comma-separated list of directories you wish to follow when downloading (see Directory-Based Limits). Elements of
list
may contain wildcards.- ‘
-X list
’
‘--exclude-directories=list
’ Specify a comma-separated list of directories you wish to exclude from download (see Directory-Based Limits). Elements of
list
may contain wildcards.- ‘
-np
’
‘--no-parent
’ Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
Next: Exit Status, Previous: Recursive Retrieval Options, Up: Invoking [Contents][Index]