Advanced Usage (GNU Wget 1.21.1-dirty Manual)
7.2 Advanced Usage
- You have a file that contains the URLs you want to download? Use the ‘
wget -i file
- Create a five levels deep mirror image of the GNU web site, with the same directory structure the original has, with only one try per document, saving the log of the activities to
wget -r https://www.gnu.org/ -o gnulog
- The same as the above, but convert the links in the downloaded files to point to local files, so you can view the documents off-line:
wget --convert-links -r https://www.gnu.org/ -o gnulog
- Retrieve only one HTML page, but make sure that all the elements needed for the page to be displayed, such as inline images and external style sheets, are also downloaded. Also make sure the downloaded page references the downloaded links.
wget -p --convert-links http://www.example.com/dir/page.html
- The same as the above, but without the
www.example.com/directory. In fact, I don’t want to have all those random server directories anyway—just save all those files under a
download/subdirectory of the current directory.
wget -p --convert-links -nH -nd -Pdownload \ http://www.example.com/dir/page.html
- Retrieve the index.html of ‘
www.lycos.com’, showing the original server headers:
wget -S http://www.lycos.com/
- Save the server headers with the file, perhaps for post-processing.
wget --save-headers http://www.lycos.com/ more index.html
- Retrieve the first two levels of ‘
wuarchive.wustl.edu’, saving them to
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
- You want to download all the GIFs from a directory on an HTTP server. You tried ‘
wget http://www.example.com/dir/*.gif’, but that didn’t work because HTTP retrieval does not support globbing. In that case, use:
wget -r -l1 --no-parent -A.gif http://www.example.com/dir/
- Suppose you were in the middle of downloading, when Wget was interrupted. Now you do not want to clobber the files already present. It would be:
wget -nc -r https://www.gnu.org/
- If you want to encode your own username and password to HTTP or FTP, use the appropriate URL syntax (see URL Format).
You would like the output documents to go to standard output instead of to files?
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
You can also combine the two options and make pipelines to retrieve the documents from remote hotlists:
wget -O - http://cool.list.com/ | wget --force-html -i -