[Tfug] finding linked pages
Glen Pfeiffer
glen at thepfeiffers.net
Thu May 1 09:14:15 MST 2008
On 04/30/2008 11:08 PM, christopher wrote:
> Hi everyone. Is there a way to search for linked pages
> on a website?
>
A web crawling app will do it. On Debian:
#aptitude search ~dcrawl | grep web
p harvestman - a very flexible web crawler application
p webcheck - website link and structure checker
#aptitude show harvestman
...
Description: a very flexible web crawler application
HarvestMan can be used to download files from websites,
according to a number of user-specified rules. The latest
version of HarvestMan supports as much as 60 plus customization
options. HarvestMan is a console (command-line) application.
Homepage: http://harvestman.freezope.org/
#aptitude show webcheck
...
Description: website link and structure checker
webcheck is a website checking tool for webmasters. It crawls a
given website and generates a number of reports in the form of
html pages. It is easy to use and generates simple, clear and
readable reports.
Features of webcheck include:
* support for http, https, ftp and file schemes
* view the structure of a site
* track down broken links
* find potentially outdated and new pages
* list links pointing to external sites
* can run without user intervention
--
Glen
More information about the tfug
mailing list