Wget/Very-Advanced-Usage
From Get docs
Previous: Advanced Usage, Up: Examples [Contents][Index]
7.3 Very Advanced Usage
- If you wish Wget to keep a mirror of a page (or FTP
subdirectories), use ‘
--mirror
’ (‘-m
’), which is the shorthand for ‘-r -l inf -N
’. You can put Wget in the crontab file asking it to recheck a site each Sunday:crontab 0 0 * * 0 wget --mirror https://www.gnu.org/ -o /home/me/weeklog
- In addition to the above, you want the links to be converted for local
viewing. But, after having read this manual, you know that link
conversion doesn’t play well with timestamping, so you also want Wget to
back up the original HTML files before the conversion. Wget invocation
would look like this:
wget --mirror --convert-links --backup-converted \ https://www.gnu.org/ -o /home/me/weeklog
But you’ve also noticed that local viewing doesn’t work all that well when HTML files are saved under extensions other than ‘
.html
’, perhaps because they were served asindex.cgi
. So you’d like Wget to rename all the files served with content-type ‘text/html
’ or ‘application/xhtml+xml
’ toname.html
.wget --mirror --convert-links --backup-converted \ --html-extension -o /home/me/weeklog \ https://www.gnu.org/
Or, with less typing:
wget -m -k -K -E https://www.gnu.org/ -o /home/me/weeklog