Wget/Very-Advanced-Usage

From Get docs

Previous: Advanced Usage, Up: Examples   [Contents][Index]


7.3 Very Advanced Usage

  • If you wish Wget to keep a mirror of a page (or FTP subdirectories), use ‘--mirror’ (‘-m’), which is the shorthand for ‘-r -l inf -N’. You can put Wget in the crontab file asking it to recheck a site each Sunday:
    crontab
    0 0 * * 0 wget --mirror https://www.gnu.org/ -o /home/me/weeklog
  • In addition to the above, you want the links to be converted for local viewing. But, after having read this manual, you know that link conversion doesn’t play well with timestamping, so you also want Wget to back up the original HTML files before the conversion. Wget invocation would look like this:
    wget --mirror --convert-links --backup-converted  \
         https://www.gnu.org/ -o /home/me/weeklog
  • But you’ve also noticed that local viewing doesn’t work all that well when HTML files are saved under extensions other than ‘.html’, perhaps because they were served as index.cgi. So you’d like Wget to rename all the files served with content-type ‘text/html’ or ‘application/xhtml+xml’ to name.html.

    wget --mirror --convert-links --backup-converted \
         --html-extension -o /home/me/weeklog        \
         https://www.gnu.org/

    Or, with less typing:

    wget -m -k -K -E https://www.gnu.org/ -o /home/me/weeklog