Wget’s recursive retrieval normally refuses to visit hosts different than the one you specified on the command line. This is a reasonable default; without it, every retrieval would have the potential to turn your Wget into a small version of google.
However, visiting different hosts, or host spanning, is sometimes a useful option. Maybe the images are served from a different server. Maybe you’re mirroring a site that consists of pages interlinked between three servers. Maybe the server has two equivalent names, and the HTML pages refer to both interchangeably.
-H’ option turns on host spanning, thus allowing Wget’s
recursive run to visit any host referenced by a link. Unless sufficient
recursion-limiting criteria are applied depth, these foreign hosts will
typically link to yet more hosts, and so on until Wget ends up sucking
up much more data than you have intended.
-D’ option allows you to specify the domains that will be
followed, thus limiting the recursion only to the hosts that belong to
these domains. Obviously, this makes sense only in conjunction with
-H’. A typical example would be downloading the contents of
www.example.com’, but allowing downloads from
wget -rH -Dexample.com http://www.example.com/
You can specify more than one address by separating them with a comma,
If there are domains you want to exclude specifically, you can do it
--exclude-domains’, which accepts the same type of arguments
-D’, but will exclude all the listed domains. For
example, if you want to download all the hosts from ‘
domain, with the exception of ‘
sunsite.foo.edu’, you can do it like
wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \ http://www.foo.edu/