Next: Exit Status, Previous: Recursive Retrieval Options, Up: Invoking [Contents][Index]
-A acclist --accept acclist
’-R rejlist --reject rejlist
’Specify comma-separated lists of file name suffixes or patterns to
accept or reject (see Types of Files). Note that if
any of the wildcard characters, ‘*
’, ‘?
’, ‘[
’ or
‘]
’, appear in an element of acclist
or rejlist
,
it will be treated as a pattern, rather than a suffix.
In this case, you have to enclose the pattern into quotes to prevent
your shell from expanding it, like in ‘-A "*.mp3"
’ or ‘-A '*.mp3'
’.
--accept-regex urlregex
’--reject-regex urlregex
’Specify a regular expression to accept or reject the complete URL.
--regex-type regextype
’Specify the regular expression type. Possible types are ‘posix
’ or
‘pcre
’. Note that to be able to use ‘pcre
’ type, wget has to be
compiled with libpcre support.
-D domain-list
’--domains=domain-list
’Set domains to be followed. domain-list
is a comma-separated list
of domains. Note that it does not turn on ‘-H
’.
--exclude-domains domain-list
’Specify the domains that are not to be followed (see Spanning Hosts).
--follow-ftp
’Follow FTP links from HTML documents. Without this option, Wget will ignore all the FTP links.
--follow-tags=list
’Wget has an internal table of HTML tag / attribute pairs that it
considers when looking for linked documents during a recursive
retrieval. If a user wants only a subset of those tags to be
considered, however, he or she should be specify such tags in a
comma-separated list
with this option.
--ignore-tags=list
’This is the opposite of the ‘--follow-tags
’ option. To skip
certain HTML tags when recursively looking for documents to download,
specify them in a comma-separated list
.
In the past, this option was the best bet for downloading a single page and its requisites, using a command-line like:
wget --ignore-tags=a,area -H -k -K -r http://site/document
However, the author of this option came across a page with tags like
<LINK REL="home" HREF="/">
and came to the realization that
specifying tags to ignore was not enough. One can’t just tell Wget to
ignore <LINK>
, because then stylesheets will not be downloaded.
Now the best bet for downloading a single page and its requisites is the
dedicated ‘--page-requisites
’ option.
--ignore-case
’Ignore case when matching files and directories. This influences the
behavior of -R, -A, -I, and -X options, as well as globbing
implemented when downloading from FTP sites. For example, with this
option, ‘-A "*.txt"
’ will match ‘file1.txt
’, but also
‘file2.TXT
’, ‘file3.TxT
’, and so on.
The quotes in the example are to prevent the shell from expanding the
pattern.
-H
’--span-hosts
’Enable spanning across hosts when doing recursive retrieving (see Spanning Hosts).
-L
’--relative
’Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts (see Relative Links).
-I list
’--include-directories=list
’Specify a comma-separated list of directories you wish to follow when
downloading (see Directory-Based Limits). Elements
of list
may contain wildcards.
-X list
’--exclude-directories=list
’Specify a comma-separated list of directories you wish to exclude from
download (see Directory-Based Limits). Elements of
list
may contain wildcards.
-np
’--no-parent
’Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
Next: Exit Status, Previous: Recursive Retrieval Options, Up: Invoking [Contents][Index]