HTMLflow - HTML flow charter and link checker
ver 01.00 10/23/95
HTMLflow is a product and copyright of On-the-Net,LLC.
See http://www.on-the-net.com for current versions and more
developments.
Mail questions and comments to tech@on-the-net.com
The HTMLflow program actually runs on the server on the real
web documents. This allows a more extensive level of
testing then allowed through fetching pages as http links.
Unix Versions:
Command line and Web form
Dos Versions:
Command link with parameter file.
Installation
UNIX versions:
Download proper platform version, HTMLflow includes
a platform specific executable program and web pages/cgi
scripts to control the HTMLflow program.
To install:
1) Expand the provided Unix archive (html.tar.gz or
html.tar.Z) , with a command such as "tar xvf html.tar.Z")
in the directory that htmlflow is going to run.
2) Edit the initial variables in runflow:
Location of Perl:
#!/usr/bin/perl
$config_path = "./"; # Real path for config file
$url_path = "/a-flow/"; # Url path for control form
$cgi_path = "/cgi-bin/a-flow/"; # Url path for
cgi script
$pgm_path = ""; # Path to HTMLflow
3) Make sure that the directory is a legal cgi-bin directory
and that the directory is a+rw for the web server.
4) Initialize the runflow.html page by typing 'runflow' in
the directory.
4) Run HTMLflow by calling runflow.html page with a URL
like:
http://localhost/a-flow/runflow.html
-- Dos version:
Installation:
Place HTMLflow.exe in a directory in the PATH
Use:
The dos version of HTMLflow uses the same command line
options as the UNIX version, except that the limits of DOS
command line require an alternative. This is to use a
parameter file with the command line values.
The parameter file is specified by using the command:
htmlflow @param-file files-to-test.htm
The param-file is a text file with one value on each line.
A flag (-x) is one value and a matching argument is a second
value on a second line.
Ex:
htmlflow -l \mypages\ \mypages\index.htm
or
htmlflow @parm \mypages\index.htm
with the file parm containing:
-l
\mypages\
Flag definitions:
(The latest set of flags as be determined by running
htmlflow with no parameters)
-a file: outfile of All url references
-b file: Block definition control file
-d dir: Real directory (must end in / or \)
-l dir: Real local directory for unspecified paths (must
end in / or \)
-e file: Outfile of all errors
-f file: Outfile of all file referenced
-i file: Default index name
-m file: outfile of missing files
-r file: outfile of structure report
-v path: Virtual home
-x[#]: Debug level
-X file: outfile of external references
Operations:
HTMLflow traces the all local (and soon external)
links in all html documents connected to the documents
specified. It looks to make sure all files specified exist.
In the process it creates a structure chart, as text or HTML
and a number of other analysis files.
Input files:
HTMLflow starts with a initial file or directory
with an implied index.htm file. It traces all links and
checks these files exist and in turn traces those files. To
be able to map anchor URLs to physical files, HTMLflow must
be able to relate each real file to a virtual URL. This is
done by providing a real directory value and a matching
virtual home path.
For example, http://www.host.com/index.html might be
/www/mypages/index.html on a UNIX system, or
c:\pages\index.htm on a dos system.
To do this, HTMLflow must be run with the command line:
htmlflow -dl /www/mypages/ /www/mypages/index.html
If the virtual home dir is not specified with the '-v'
switch, it is assumed to be '/'. This example uses '-ld'
because the real directory for the home directory and local
references for unspecified pages are this directory. This
is the normal situation.
Block definitions:
For many web sites, all pages start and end with a
common block of URL links. This is to allow easy navigation
and a consistent look to each page. This common 'blocks' of
links cause a structure charge to be very cluttered and
confusing. HTMLflow allows defining these blocks in a
'Block definition control file'. The control file has a
very simple format: the name of each block is on the left
margin and the URL for each entry in the block is indented.
Blanks lines are ignored and comments are allowed on lines
starting with '#'.
A blocks file is specified by (-b) on the htmlflow command
line:
-b block-file-name
For example: A blocks file:
HEAD1
left.htm
right.htm
home.htm
FOOT1
left.htm
next.htm
home.htm
mailto:webmaster@mysys.com
Default index.html file:
When a local url specifies just a directory, the
'default index file' is used as the name of the HTML file to
be processed.
The built-in value is 'index.html', it can be changed using
(-i), for example:
-i home.htm
Output files:
Structure tree report: -r[h] report-filename
(-r for text or -rh for html)
The structure tree consists of three sections: Structure
chart, Module index and block definitions, if any blocks.
Each module is listed with all hyper references shown with
nesting show on the first reference to each module. When
the report is generated as html, the module references are
linked to each module's details and the links which are the
contents of each module are active links to the real
contents.
The module index is an alphabetical listing of all modules
and their line number in the structure tree, on the html
version, these are hot links.
List of all URL references: -a list-filename
Specify a real directory: -d dir
Specify a local directory for unspecified files: -l dir
List of all errors: -e list-filename
List of all files referenced: -f list-filename
List of all missing files: -m list-filename
The missing file list has no directory path, this
can be used to locate URLs with the correct filename but the
wrong path.
List of all external references: -X list-filename
This list can be used to check for invalid external
URLs more efficiently.
Report file sizes:
BIG !! - this program is really designed to be used
locally on a server or on a set of local test files. For
example, a structure tree for a web site with 240 files is
250K. This only takes a few seconds to load on a directly
connected system but quite a few over a slow link.