Configuring local checks: -index and -localhtml |
There are a few things Big Brother needs to know when checking local (file
) links.
The first one is the default index file name. Whenever a local link points to a directory, Big Brother looks for an
index file inside it. (The index file is the file that should be displayed instead of the directory listing. As its
name implies, it is supposed to describe what's in the directory, but it does not have to. It is also all right if
there is no index file at all.) If recursion is on, and if this file's URL matches the recursion regexp, then it shall
be opened and checked. If you don't use the -index
option, Big Brother behaves as if you had said
-index "index.html"
Additionally, Big Brother needs to be able to tell which files are HTML files. (This is necessary when recursion is
on, because it would make no sense to download, say, an image file and to try analyzing it.) For remote files, there
is no problem, because the server shall supply the necessary information. There is one, however, for local files,
where no information is available. So, Big Brother needs you to supply a
regular expression using the
-localhtml
option. When Big Brother finds a link to a local file, it determines whether it is an HTML file by
matching its name against this regular expression. If it does match, the file is assumed to be an HTML document.
If you don't use the -localhtml
option, Big Brother behaves as if you had typed
-localhtml "\.s?htm"
.htm
or .shtm
. Here is
another example. My site contains files in English and in French, whose names end in .html.en
and
.html.fr
, respectively. I can let Big Brother know about it by typing
-localhtml "\.html\.(en|fr)$"
Configuring local checks: -index and -localhtml |