=============================================================================
gHTTrack v0.4 / v0.5 - Frontend for HTTrack v3.40 (or later)
=============================================================================
Installation
============
The configure script has been omitted from this package due to its size
and the fact that it doesn't work on all systems. Instead, you run an
autogen script which creates the configure script. If you need to pass
any arguments to the configure script, you can do so on the autogen.sh
command line. Other than that, installation is pretty much the same as
usual:
sh autogen.sh
make
make install
Using Anjuta
============
This package includes an Anjuta 1.0.2 project file. Because of the missing
configure script, "Build|Build" won't work initially; you'll need to select
"Build|Auto generate..." first.
General
=======
This is a frontend for Xavier Roche's 'HTTrack Website Copier'. Being a
frontend, it won't work unless you have the httrack backend installed on your
system. If you have httrack installed, there should be some documentation
somewhere in your install path. This intro includes a few notes for the
gHTTrack frontend only. For details concerning the HTTrack engine, consult the
httrack documentation.
Although the amount of options may seem a bit overwhelming at first, the only
two that you really need to specify are the URLs (Web addresses) to mirror and
a destination path (the directory where fetched files should be stored). All
others are optional and the default settings should suffice for most purposes.
The User Interface
==================
Previous versions of this frontend included a menu, toolbar, and various
dialogs. This version uses a Notebook widget with four tabbed pages
(Introduction, Options, Actions, and HTTrack), as described below.
The Options Page
----------------
This page replaces the old Options dialog. This is where you set various
options that will be used as arguments/parameters to the HTTrack main engine.
Note that all options marked with an asterisk (*) are httrack defaults and do
not need to be specified explicitly. Once you've set all the desired options,
press the [OK] button to move on to the next step.
The Actions Page
----------------
This page replaces the menu and toolbar used in previous versions of gHTTrack.
This is where you select an action to perform. Note: If the 'Project Info' and
'HTTrack Command Line' boxes are empty, it means you haven't yet pressed the
[OK] button on the Options page. Or, if only the 'HTTrack Command Line' box
is empty, it means that either there are no URLs specified or no destination
path has been set.
The HTTrack Page
----------------
This is basically the same as in previous versions of gHTTrack. This is where
you read (or re-read) the HTTrack log file. You can check the end of the log
to find out if HTTrack has finished processing the mirror.
Programming Notes
=================
- For callbacks that use FileDlg, the chain of commands are:
- The initial callback should call AppRunFileDialog(),
- AppRunFileDialog() displays the FileDlg dialog,
- The actual handler code should be placed in on_filedlg_btn_ok_clicked().
- When adding new options, modify the following:
- AppShowSettings()
- Global Options array in main.c
- OPTIONCOUNT definition in main.h
- For options that use a GtkComboBoxEntry widget, set the maximum length and
tooltip in AppDoWidgetSetup().
=============================================================================
gHTTrack v0.3 - Frontend for HTTrack v3.23 (or later)
=============================================================================
gHTTrack is only a _frontend_ for 'HTTrack Web Copier' and therefore won't
work unless you have httrack installed on your system. If you have httrack
installed, there should be some documentation somewhere in your install path.
This readme file contains a few notes for the gHTTrack frontend only. For
details concerning HTTrack itself, consult the httrack documentation.
The Main Window
===============
- Experienced httrack users will notice that the Command Line doesn't include
an 'action' switch. You don't need to add it; it's appended automatically
whenever you choose an Action item from the menu or toolbar.
- The textboxes on the tabbed pages will contain various bits of info, but
are initially empty. The first one is filled after setting options in the
Options Dialog. The other two are filled only after running httrack.
The Options Dialog
==================
- Note that all options marked with '*' are httrack defaults and do not
need to be specified explicitly.
- In the Project section, the list of URLs should be kept as short and as
specific as possible. 'Short' refers to the _amount_ of lines, not the
length of each line. 'Specific' means that you should *avoid* URLs like
http://www.somewhere.net and should specify something more specific like
http://www.somewhere.net/programming/c/apps. Note that if the URL textbox
contains more than one or two lines, you may be better off splitting them
up into separate projects, or save the URLs to a file.
- Options are saved automatically when the OK button is pressed. The Save
button need only be used when editing several projects at a time.
- In version 0.2, the behavior of the 'Set To Defaults' button has changed.
The "Default Settings" project (the first item in the Project Name list)
can serve as a sort of 'template' in which you set your most-often-used
options. Pressing the 'Set To Defaults' button has one of two effects,
depending on what the current project is:
1. If the current project is "Default Settings", pressing the 'Set To
Defaults' button will use HTTrack's default settings for the "Default
Settings" project. This is the same behavior as in previous versions
of gHTTrack.
2. If the current project is any of the "Custom Settings n", pressing the
'Set To Defaults' button will use the settings from the "Default
Settings" project to set the options in the current project.
When you press the 'Set To Defaults' button, gHTTrack no longer asks if
you really want to do this; it just does it. To re-enable confirmation,
you can uncomment the line in the on_opdlg_btn_reset_defaults_clicked()
function in the callbacks.c file.
Note: the on_opdlg_btn_reset_defaults_clicked() funtion uses the _saved_
"Default Settings" project when setting the current Custom project. You
will get unexpected results if you press the 'Set To Defaults' button
if you've changed anything in the "Default Settings" project without
saving them first.
-
=============================================================================
Alphabetical List of HTTrack Switches
=============================================================================
AN maximum transfer rate in bytes/seconds (1000=1KB/s max) (--max-rate[=N])
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume )
%B tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
bN accept cookies in cookies.txt (0=do not accept,* 1=accept) (--cookies[=N])
C create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
cN number of multiple connections (*c8) (--sockets[=N])
%cN maximum number of connections/seconds (*%c10) (--connection-per-second[=N])
EN maximum mirror time in seconds (60=1 minute, 3600=1 hour) (--max-time[=N])
%eN set the external links depth to N (* %e0) (--ext-depth[=N])
F user-agent field (-F "user-agent name") (--user-agent )
%F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer )
f *log in files (--file-log)
f2 one single log file (--single-log)
%f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])
GN pause transfer if N bytes reached, and wait until lock file is deleted (--max-pause[=N])
g just get files (saved in the current directory) (--get-files)
HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow (--host-control[=N])
%h force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
I *make an index (I0 don't make) (--index)
%I make an searchable index for this mirror (* %I0 don't make) (--search-index)
i continue an interrupted mirror using the cache (--continue)
@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only) (--protocol[=N])
JN traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link (--min-rate[=N])
j *parse Java Classes (j0 don't parse) (--parse-java[=N])
KN keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K4 original links, K3 absolute URI links) (--keep-links[=N])
k store all files in cache (not useful if files on disk) (--store-all-in-cache)
%k use keep-alive if possible, greately reducing latency for small files and test requests (%k0 don't use) (--keep-alive)
LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 compatible) (--long-names[=N])
%L add all URL located in this text file (one URL per line) (--list )
%l preffered language (-%l "fr, en, jp, *" (--language )
MN maximum overall size that can be uploaded/scanned (--max-size[=N])
mN maximum file length for a non-html file (--max-files[=N])
mN,N2 maximum file length for non html (N) and html (N2)
NN structure type (0 *original structure, 1+: see below) (--structure[=N]) or user defined structure (-N "%h%p/%n%q.%t")
n get non-html files 'near' an html file (ex: an image located outside) (--near)
%n do not re-download locally erased files (--do-not-recatch)
O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path )
o *generate output html file in case of error (404..) (o0 don't generate) (--generate-errors)
P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy )
%P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
%p preserve html files 'as is' (identical to '-K4 -%F ""') (--preserve)
Q no log - quiet mode (--do-not-log)
q no questions - quiet mode (--quiet)
%q *include query string for local files (useless, for information purpose only) (%q0 don't include) (--include-query-string)
RN number of retries, in case of timeout or non-fatal errors (*R1) (--retries[=N])
rN set the mirror depth to N (* r9999) (--depth[=N])
%S add all scan rules located in this text file (one scan rule per line) (--urllist )
sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always) (--robots[=N])
%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
TN timeout, number of seconds after a non-responding link is shutdown (--timeout)
t test all URLs (even forbidden ones) (--test)
%U run the engine with another id when called as root (-%U smith) (--user )
u check document type if unknown (cgi,asp..) (u0 don't check, * u1 check but /, u2 check always) (--check-type[=N])
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd )
v log on screen (--verbose)
%v display on screen filenames downloaded (in realtime) - * %v1 short version (--display)
w *mirror web sites (--mirror)
W mirror web sites, semi-automatic (asks questions) (--mirror-wizard)
X *purge old files after update (X0 keep delete) (--purge-old[=N])
x replace external html links by error pages (--replace-external)
%x do not include any password for external password protected websites (%x0 include) (--no-passwords)
Y mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)
z log - extra infos (--extra-log)
Z log - debug (--debug-log)
Expert options:
a *stay on the same address (--stay-on-same-address)
B can both go up&down into the directory structure (--can-go-up-and-down)
D *can only go down into subdirs (--can-go-down)
d stay on the same principal domain (--stay-on-same-domain)
e go everywhere on the web (--go-everywhere)
%H debug HTTP headers in logfile (--debug-headers)
l stay on the same TLD (eg: .com) (--stay-on-same-tld)
pN priority mode: (* p3) (--priority[=N])
p0 just scan, don't save anything (for checking links)
p1 save only html files
p2 save only non html files
*p3 save all files
p7 get html files before, then treat other files
S stay on the same directory (--stay-on-same-dir)
U can only go to upper directories (--can-go-up)
Guru options: (do NOT use if possible)
#! execute a shell command (-#! "echo hello") (--exec )
#0 filter test (-#0 '*.gif' 'www.bar.com/foo.gif') (--debug-testfilters )
#C cache list (-#C '*.com/spider*.gif' (--debug-cache )
#X *use optimized engine (limited memory boundary checks) (--fast-engine)
#FN maximum number of filters (--advanced-maxfilters[=N])
#f always flush log files (--advanced-flushlogs)
#h version info (--version)
#K scan stdin (debug) (--debug-scanstdin)
#L maximum number of links (-#L1000000) (--advanced-maxlinks)
#p display ugly progress information (--advanced-progressinfo)
#P catch URL (--catch-url)
#R old FTP routines (debug) (--debug-oldftp)
#T generate transfer ops. log every minutes (--debug-xfrstats)
#u wait time (--advanced-wait)
#Z generate transfer rate statictics every minutes (--debug-ratestats)
Details: Option N
N0 Site-structure (default)
N1 HTML in web/, images/other files in web/images/
N2 HTML in web/HTML, images/other in web/images
N3 HTML in web/, images/other in web/
N4 HTML in web/, images/other in web/xxx, where xxx is the file extension (all gif will be placed onto web/gif, for example)
N5 Images/other in web/xxx and HTML in web/HTML
N99 All files in web/, with random names (gadget !)
N100 Site-structure, without www.domain.xxx/
N101 Identical to N1 exept that "web" is replaced by the site's name
N102 Identical to N2 exept that "web" is replaced by the site's name
N103 Identical to N3 exept that "web" is replaced by the site's name
N104 Identical to N4 exept that "web" is replaced by the site's name
N105 Identical to N5 exept that "web" is replaced by the site's name
N199 Identical to N99 exept that "web" is replaced by the site's name
N1001 Identical to N1 exept that there is no "web" directory
N1002 Identical to N2 exept that there is no "web" directory
N1003 Identical to N3 exept that there is no "web" directory (option set for g option)
N1004 Identical to N4 exept that there is no "web" directory
N1005 Identical to N5 exept that there is no "web" directory
N1099 Identical to N99 exept that there is no "web" directory
Details: User-defined option N
'%n' Name of file without file type (ex: image)
'%N' Name of file, including file type (ex: image.gif)
'%t' File type (ex: gif)
'%p' Path [without ending /] (ex: /someimages)
'%h' Host name (ex: www.someweb.com)
'%M' URL MD5 (128 bits, 32 ascii bytes)
'%Q' query string MD5 (128 bits, 32 ascii bytes)
'%q' small query string MD5 (16 bits, 4 ascii bytes)
'%s?' Short name version (ex: %sN)
'%[param]' param variable in query string
'%[param:before:after:notfound:empty]' advanced variable extraction
Details: User-defined option N and advanced variable extraction
%[param:before:after:notfound:empty]
param : parameter name
before : string to prepend if the parameter was found
after : string to append if the parameter was found
notfound : string replacement if the parameter could not be found
empty : string replacement if the parameter was empty
all fields, except the first one (the parameter name), can be empty
Details: Option K
K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (--keep-links[=N])
K4 -> foo.cgi?q=45 (original URL)
K3 -> /folder/foo.cgi?q=45 (absolute URI)
Shortcuts:
--assume standard equivalent to -%A php2,php3,php4,php,cgi,asp,jsp,pl,cfm=text/html
--catchurl create a temporary proxy to capture an URL or a form post URL
--clean erase cache & log files
--continue continue a mirror, without confirmation (-iC1)
--get get the files indicated, do not seek other URLs (-qg)
--http10 force http/1.0 requests (-%h)
--list add all URL located in this text file (-%L)
--mirror *make a mirror of site(s) (default)
--mirrorlinks mirror all links in 1st level pages (-Y)
--skeleton make a mirror, but gets only html files (-p1)
--spider spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testlinks test links in pages (-r1p0C0I0t)
--testsite identical to --spider
--update update a mirror, without confirmation (-iC2)
=============================================================================
Categorical List of HTTrack Switches
=============================================================================
HTTrack version 3.23 (compiled May 19 2003)
usage: httrack [-option] [+] [-]
with options listed below: (* is the default value)
General options:
O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path )
Action options:
w *mirror web sites (--mirror)
W mirror web sites, semi-automatic (asks questions) (--mirror-wizard)
g just get files (saved in the current directory) (--get-files)
i continue an interrupted mirror using the cache (--continue)
Y mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)
Proxy options:
P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy )
%f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])
Limits options:
rN set the mirror depth to N (* r9999) (--depth[=N])
%eN set the external links depth to N (* %e0) (--ext-depth[=N])
mN maximum file length for a non-html file (--max-files[=N])
mN,N2 maximum file length for non html (N) and html (N2)
MN maximum overall size that can be uploaded/scanned (--max-size[=N])
EN maximum mirror time in seconds (60=1 minute, 3600=1 hour) (--max-time[=N])
AN maximum transfer rate in bytes/seconds (1000=1KB/s max) (--max-rate[=N])
%cN maximum number of connections/seconds (*%c10) (--connection-per-second[=N])
GN pause transfer if N bytes reached, and wait until lock file is deleted (--max-pause[=N])
Flow control:
cN number of multiple connections (*c8) (--sockets[=N])
TN timeout, number of seconds after a non-responding link is shutdown (--timeout)
RN number of retries, in case of timeout or non-fatal errors (*R1) (--retries[=N])
JN traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link (--min-rate[=N])
HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow (--host-control[=N])
Links options:
%P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
n get non-html files 'near' an html file (ex: an image located outside) (--near)
t test all URLs (even forbidden ones) (--test)
%L add all URL located in this text file (one URL per line) (--list )
%S add all scan rules located in this text file (one scan rule per line) (--urllist )
Build options:
NN structure type (0 *original structure, 1+: see below) (--structure[=N])
or user defined structure (-N "%h%p/%n%q.%t")
LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 compatible) (--long-names[=N])
KN keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K4 original links, K3 absolute URI links) (--keep-links[=N])
x replace external html links by error pages (--replace-external)
%x do not include any password for external password protected websites (%x0 include) (--no-passwords)
%q *include query string for local files (useless, for information purpose only) (%q0 don't include) (--include-query-string)
o *generate output html file in case of error (404..) (o0 don't generate) (--generate-errors)
X *purge old files after update (X0 keep delete) (--purge-old[=N])
%p preserve html files 'as is' (identical to '-K4 -%F ""') (--preserve)
Spider options:
bN accept cookies in cookies.txt (0=do not accept,* 1=accept) (--cookies[=N])
u check document type if unknown (cgi,asp..) (u0 don't check, * u1 check but /, u2 check always) (--check-type[=N])
j *parse Java Classes (j0 don't parse) (--parse-java[=N])
sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always) (--robots[=N])
%h force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
%k use keep-alive if possible, greately reducing latency for small files and test requests (%k0 don't use) (--keep-alive)
%B tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume )
shortcut: '--assume standard' is equivalent to -%A php2,php3,php4,php,cgi,asp,jsp,pl,cfm=text/html
@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only) (--protocol[=N])
Browser ID:
F user-agent field (-F "user-agent name") (--user-agent )
%F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer )
%l preffered language (-%l "fr, en, jp, *" (--language )
Log, index, cache
C create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
k store all files in cache (not useful if files on disk) (--store-all-in-cache)
%n do not re-download locally erased files (--do-not-recatch)
%v display on screen filenames downloaded (in realtime) - * %v1 short version (--display)
Q no log - quiet mode (--do-not-log)
q no questions - quiet mode (--quiet)
z log - extra infos (--extra-log)
Z log - debug (--debug-log)
v log on screen (--verbose)
f *log in files (--file-log)
f2 one single log file (--single-log)
I *make an index (I0 don't make) (--index)
%I make an searchable index for this mirror (* %I0 don't make) (--search-index)
Expert options:
pN priority mode: (* p3) (--priority[=N])
p0 just scan, don't save anything (for checking links)
p1 save only html files
p2 save only non html files
*p3 save all files
p7 get html files before, then treat other files
S stay on the same directory (--stay-on-same-dir)
D *can only go down into subdirs (--can-go-down)
U can only go to upper directories (--can-go-up)
B can both go up&down into the directory structure (--can-go-up-and-down)
a *stay on the same address (--stay-on-same-address)
d stay on the same principal domain (--stay-on-same-domain)
l stay on the same TLD (eg: .com) (--stay-on-same-tld)
e go everywhere on the web (--go-everywhere)
%H debug HTTP headers in logfile (--debug-headers)
Guru options: (do NOT use if possible)
#X *use optimized engine (limited memory boundary checks) (--fast-engine)
#0 filter test (-#0 '*.gif' 'www.bar.com/foo.gif') (--debug-testfilters )
#C cache list (-#C '*.com/spider*.gif' (--debug-cache )
#f always flush log files (--advanced-flushlogs)
#FN maximum number of filters (--advanced-maxfilters[=N])
#h version info (--version)
#K scan stdin (debug) (--debug-scanstdin)
#L maximum number of links (-#L1000000) (--advanced-maxlinks)
#p display ugly progress information (--advanced-progressinfo)
#P catch URL (--catch-url)
#R old FTP routines (debug) (--debug-oldftp)
#T generate transfer ops. log every minutes (--debug-xfrstats)
#u wait time (--advanced-wait)
#Z generate transfer rate statictics every minutes (--debug-ratestats)
#! execute a shell command (-#! "echo hello") (--exec )
Command-line specific options:
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd )
%U run the engine with another id when called as root (-%U smith) (--user )
Details: Option N
N0 Site-structure (default)
N1 HTML in web/, images/other files in web/images/
N2 HTML in web/HTML, images/other in web/images
N3 HTML in web/, images/other in web/
N4 HTML in web/, images/other in web/xxx, where xxx is the file extension (all gif will be placed onto web/gif, for example)
N5 Images/other in web/xxx and HTML in web/HTML
N99 All files in web/, with random names (gadget !)
N100 Site-structure, without www.domain.xxx/
N101 Identical to N1 exept that "web" is replaced by the site's name
N102 Identical to N2 exept that "web" is replaced by the site's name
N103 Identical to N3 exept that "web" is replaced by the site's name
N104 Identical to N4 exept that "web" is replaced by the site's name
N105 Identical to N5 exept that "web" is replaced by the site's name
N199 Identical to N99 exept that "web" is replaced by the site's name
N1001 Identical to N1 exept that there is no "web" directory
N1002 Identical to N2 exept that there is no "web" directory
N1003 Identical to N3 exept that there is no "web" directory (option set for g option)
N1004 Identical to N4 exept that there is no "web" directory
N1005 Identical to N5 exept that there is no "web" directory
N1099 Identical to N99 exept that there is no "web" directory
Details: User-defined option N
'%n' Name of file without file type (ex: image)
'%N' Name of file, including file type (ex: image.gif)
'%t' File type (ex: gif)
'%p' Path [without ending /] (ex: /someimages)
'%h' Host name (ex: www.someweb.com)
'%M' URL MD5 (128 bits, 32 ascii bytes)
'%Q' query string MD5 (128 bits, 32 ascii bytes)
'%q' small query string MD5 (16 bits, 4 ascii bytes)
'%s?' Short name version (ex: %sN)
'%[param]' param variable in query string
'%[param:before:after:notfound:empty]' advanced variable extraction
Details: User-defined option N and advanced variable extraction
%[param:before:after:notfound:empty]
param : parameter name
before : string to prepend if the parameter was found
after : string to append if the parameter was found
notfound : string replacement if the parameter could not be found
empty : string replacement if the parameter was empty
all fields, except the first one (the parameter name), can be empty
Details: Option K
K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (--keep-links[=N])
K4 -> foo.cgi?q=45 (original URL)
K3 -> /folder/foo.cgi?q=45 (absolute URI)
Shortcuts:
--mirror *make a mirror of site(s) (default)
--get get the files indicated, do not seek other URLs (-qg)
--list add all URL located in this text file (-%L)
--mirrorlinks mirror all links in 1st level pages (-Y)
--testlinks test links in pages (-r1p0C0I0t)
--spider spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite identical to --spider
--skeleton make a mirror, but gets only html files (-p1)
--update update a mirror, without confirmation (-iC2)
--continue continue a mirror, without confirmation (-iC1)
--catchurl create a temporary proxy to capture an URL or a form post URL
--clean erase cache & log files
--http10 force http/1.0 requests (-%h)
example: httrack www.someweb.com/bob/
means: mirror site www.someweb.com/bob/ and only this site
example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
means: mirror the two sites together (with shared links) and accept any .jpg files on .com sites
example: httrack www.someweb.com/bob/bobby.html +* -r6
means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web
example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080
runs the spider on www.someweb.com/bob/bobby.html using a proxy
example: httrack --update
updates a mirror in the current folder
example: httrack
will bring you to the interactive mode
example: httrack --continue
continues a mirror in the current folder
HTTrack version 3.23 (compiled May 19 2003)
Copyright (C) Xavier Roche and other contributors