Controlling recursion: -rec |
By default, recursion is off. This means that Big Brother reads the documents you explicitly specify (either on the command line or using the -stdin option), makes sure that the links they contain are valid, and stops.
If you want to check a whole site at once, you must turn recursion on. When recursion is on, Big Brother not only checks each link, but also determines whether it points to an HTML document. If it does, Big Brother fetches the document (which can be local or remote) and checks the links in it. If one of these links points to an HTML document, it is also fetched, and so on, recursively.
Of course, this process has to stop at some point, otherwise you are likely to check all of the World Wide Web! So, when recursion is on, you must provide a regular expression to define the boundary of your site. Every time Big Brother finds a link, it shall match it against the regular expression you have provided. If it does match, Big Brother fetches the document and checks it recursively; otherwise, it checks the link but doesn't follow it.
Here is the simplest, and most common, example. The address of my home page is
http://pauillac.inria.fr/~fpottier/
. If I want to check my whole
site in a single run of Big Brother, I invoke it with the following option:
-rec "^http://pauillac\.inria\.fr/~fpottier/"
$
sign, it matches all URLs which start with
http://pauillac.inria.fr/~fpottier/
. Notice that I inserted a backslash character in front of each dot, because
otherwise a dot matches any character.
Here is another example: suppose I maintain several Web servers, but all of them are within the domain
orange.com
. Then, I can use the following regexp:
-rec "^http://.*\.orange\.com/"
orange.com
.
Note that the address is checked against the recursion regexp after the mappings have been applied to the
address. So, if you are using mappings, say, to map a remote site to a certain folder of your hard disk, the recursion
regexp should describe the local files' addresses, not the remote ones'. For instance, using ^file:
as recursion
regexp simply allows recursion on all local files.
Controlling recursion: -rec |