[Tfug] mirroring software?

Choprboy tfug@tfug.org
Sun Jan 5 03:41:01 2003


On Sunday 05 January 2003 02:56 am, Gordon C. Zaft wrote:

> I'm setting up an internal webserver and I'd like to grab copies of some
> of my favorite websites so that I'll always have 'em around in case they go
> under.  Also I'd like to mirror some that I have externally hosted.  Can
> anyone recommend software to do this?  The server is running Apache 1.3.27
> on FreeBSD 4.7.
>

Well, it won't pull CGI code, etc. (the raw code that is). But I regularly use 
both "wget" and "pavuk" to crawl websites and download entire layouts. Grab a 
section of news articles or all the text and pictures of a big description/ 
instruction piece on some obscure hardware, for later offline reading. Both 
will automatically rewrite URLs in the downloaded pages as well so the local 
copy works just like the remote. Wget is very robust, mostly commandline 
driven. Pavuk has a nice interface for setting lots of matching/reject 
parameters for crawling, though hard to script and cumbersome to enter lots 
of individual starting URLs.

Adrian