Recursive Accept/Reject Options (GNU Wget 1.21.1-dirty Manual)

From Get docs
Wget/docs/latest/Recursive-Accept 002fReject-Options

2.12 Recursive Accept/Reject Options

-A acclist --accept acclist
-R rejlist --reject rejlist

Specify comma-separated lists of file name suffixes or patterns to accept or reject (see Types of Files). Note that if any of the wildcard characters, ‘*’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A "*.mp3"’ or ‘-A '*.mp3'’.

--accept-regex urlregex
--reject-regex urlregex

Specify a regular expression to accept or reject the complete URL.

--regex-type regextype

Specify the regular expression type. Possible types are ‘posix’ or ‘pcre’. Note that to be able to use ‘pcre’ type, wget has to be compiled with libpcre support.

-D domain-list

Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on ‘-H’.

--exclude-domains domain-list

Specify the domains that are not to be followed (see Spanning Hosts).


Follow FTP links from HTML documents. Without this option, Wget will ignore all the FTP links.


Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.


This is the opposite of the ‘--follow-tags’ option. To skip certain HTML tags when recursively looking for documents to download, specify them in a comma-separated list.

In the past, this option was the best bet for downloading a single page and its requisites, using a command-line like:

wget --ignore-tags=a,area -H -k -K -r http://site/document

However, the author of this option came across a page with tags like <LINK REL="home" HREF="/"> and came to the realization that specifying tags to ignore was not enough. One can’t just tell Wget to ignore <LINK>, because then stylesheets will not be downloaded. Now the best bet for downloading a single page and its requisites is the dedicated ‘--page-requisites’ option.


Ignore case when matching files and directories. This influences the behavior of -R, -A, -I, and -X options, as well as globbing implemented when downloading from FTP sites. For example, with this option, ‘-A "*.txt"’ will match ‘file1.txt’, but also ‘file2.TXT’, ‘file3.TxT’, and so on. The quotes in the example are to prevent the shell from expanding the pattern.


Enable spanning across hosts when doing recursive retrieving (see Spanning Hosts).


Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts (see Relative Links).

-I list

Specify a comma-separated list of directories you wish to follow when downloading (see Directory-Based Limits). Elements of list may contain wildcards.

-X list

Specify a comma-separated list of directories you wish to exclude from download (see Directory-Based Limits). Elements of list may contain wildcards.


Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.