httrack
allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.
runs the spider on www.someweb.com/bob/bobby.html using a proxy
httrack --update
updates a mirror in the current folder
httrack
will bring you to the interactive mode
httrack --continue
continues a mirror in the current folder
OPTIONS
General options:
-O
path for mirror/logfiles+cache (-O path
mirror[,path
cache
and
logfiles]) (--path <param>)
-%O
chroot path to, must be r00t (-%O root
path) (--chroot <param>)
Action options:
-w
*mirror web sites (--mirror)
-W
mirror web sites, semi-automatic (asks questions) (--mirror-wizard)
-g
just get files (saved in the current directory) (--get-files)
-i
continue an interrupted mirror using the cache (--continue)
-Y
mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)
Proxy options:
-P
proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy <param>)
-%f
*use proxy for ftp (f0 don t use) (--httpproxy-ftp[=N])
-%b
use this local hostname to make/send requests (-%b hostname) (--bind <param>)
Limits options:
-rN
set the mirror depth to N (* r9999) (--depth[=N])
-%eN
set the external links depth to N (* %e0) (--ext-depth[=N])
-mN
maximum file length for a non-html file (--max-files[=N])
-mN,N2
maximum file length for non html (N) and html (N2)
-MN
maximum overall size that can be uploaded/scanned (--max-size[=N])
-EN
maximum mirror time in seconds (60=1 minute, 3600=1 hour) (--max-time[=N])
-AN
maximum transfer rate in bytes/seconds (1000=1KB/s max) (--max-rate[=N])
-%cN
maximum number of connections/seconds (*%c10) (--connection-per-second[=N])
-GN
pause transfer if N bytes reached, and wait until lock file is deleted (--max-pause[=N])
-%mN
maximum mms stream download time in seconds (60=1 minute, 3600=1 hour) (--max-mms-time[=N])
Flow control:
-cN
number of multiple connections (*c8) (--sockets[=N])
-TN
timeout, number of seconds after a non-responding link is shutdown (--timeout)
-RN
number of retries, in case of timeout or non-fatal errors (*R1) (--retries[=N])
-JN
traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link (--min-rate[=N])
-HN
host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow (--host-control[=N])
Links options:
-%P
*extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don t use) (--extended-parsing[=N])
-n
get non-html files near an html file (ex: an image located outside) (--near)
-t
test all URLs (even forbidden ones) (--test)
-%L
<file> add all URL located in this text file (one URL per line) (--list <param>)
-%S
<file> add all scan rules located in this text file (one scan rule per line) (--urllist <param>)
Build options:
-NN
structure type (0 *original structure, 1+: see below) (--structure[=N])
-or
user defined structure (-N "%h%p/%n%q.%t")
-%N
delayed type check, don t make any link test but wait for files download to start instead (experimental) (%N0 don t use, %N1 use for unknown extensions, * %N2 always use)
-%D
cached delayed type check, don t wait for remote type during updates, to speedup them (%D0 wait, * %D1 don t wait) (--cached-delayed-type-check)
-%M
generate a RFC MIME-encapsulated full-archive (.mht) (--mime-html)
<URLs> get the files indicated, do not seek other URLs (-qg)
--list
<text file> add all URL located in this text file (-%L)
--mirrorlinks
<URLs> mirror all links in 1st level pages (-Y)
--testlinks
<URLs> test links in pages (-r1p0C0I0t)
--spider
<URLs> spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite
<URLs> identical to --spider
--skeleton
<URLs> make a mirror, but gets only html files (-p1)
--update
update a mirror, without confirmation (-iC2)
--continue
continue a mirror, without confirmation (-iC1)
--catchurl
create a temporary proxy to capture an URL or a form post URL
--clean
erase cache & log files
--http10
force http/1.0 requests (-%h)
Details: Option %W: External callbacks prototypes
see htsdefines.h
FILES
/etc/httrack.conf
The system wide configuration file.
ENVIRONMENT
HOME
Is being used if you defined in /etc/httrack.conf the line
path ~/websites/#
DIAGNOSTICS
Errors/Warnings are reported to
hts-log.txt
by default, or to stderr if the
-v
option was specified.
LIMITS
These are the principals limits of HTTrack for that moment. Note that we did not heard about any other utility
that would have solved them.
- Several scripts generating complex filenames may not find them (ex: img.src='image'+a+Mobj.dst+'.gif')
- Some java classes may not find some files on them (class included)
- Cgi-bin links may not work properly in some cases (parameters needed). To avoid them: use filters like -*cgi-bin*
BUGS
Please reports bugs to
<bugs@httrack.com>.
Include a complete, self-contained example that will allow the bug to be reproduced, and say which version of httrack you are using. Do not forget to detail options used, OS version, and any other information you deem necessary.
COPYRIGHT
Copyright (C) Xavier Roche and other contributors
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.