Setting up mappings: -mapfrom and -mapto |
Mappings are a mechanism which allows URLs to be rewritten before being checked. The most typical use of mappings is to let Big Brother bypass a Web server and read documents directly from disk, as shown in the example below.
You can specify any number of mappings. All mappings are applied to the URL being checked, after it has been resolved (that is, turned into an absolute URL, if it was relative). A mapping is specified as follows:
-mapfrom
regexp -mapto
replacement
$1
, $2
, etc; these sequences will be replaced by the text
matched by the corresponding group in the regular expression. $0
stands for the text matched by the whole
regular expression.
Here is a simple and realistic example. Suppose I have a Web site available at
http://www.users.com/~tom/
, whose files are stored on my hard disk in the directory /home/tom/web/
. When
checking my site, I want Big Brother to read the documents directly off my disk, instead of requesting them from the
Web server. So, I set up a mapping:
-mapfrom "^http://www\.users\.com/~tom/"
-mapto "file:///home/tom/web/"
http://www.users.com/~tom/index.html
. The mapping applies, so
the URL is rewritten and becomes file:///home/tom/web/index.html
. Thus, Big Brother will read the file from
disk, rather than request it from the server.
Let us explore this example a bit further. Assume the above index file contains a link to ../~amy/
. Amy's Web
site is not stored on my hard disk. Will Big Brother be smart enough to request it from the server? Yes!
Although Big Brother applies the mapping to a URL when trying to access it, it remembers the original URL and uses it
as the base URL when resolving relative URLs. In slightly less technical terms, here is what this means: when it finds
the relative link ../~amy/
, Big Brother resolves it. It is resolved with respect to the unmapped URL of the
current document, which is http://www.users.com/~tom/index.html
. So, the resolved URL is
http://www.users.com/~amy/
. At this point, the mapping is applied to this URL, but it does not match, so the URL
remains unchanged. As a result, Big Brother properly sends a request to the Web server to retrieve this document.
So, to sum up, here's how to bypass a Web server using mappings. First, set up a mapping which maps http
URLs to
file
URLs appropriately. (Have a look at the URL syntax rules.)
Second, ask Big Brother to check the remote URL, as usual.
Now, here comes a more elaborate example, which shows how powerful mappings can be. I'm still Tom and I still have a Web site, but this time, only the HTML documents are stored on my hard disk - other files, such as images, are available only on the server. So, I want the mapping to apply only to HTML files. Here's the way to do it:
-mapfrom "^http://www\.users\.com/~tom/(.*\.html)$"
-mapto "file:///home/tom/web/$1"
.html
. Besides, the document name is
enclosed by a group using (
and )
, which allows referring to it by $1
in the replacement
string. So, http://www.users.com/~tom/index.html
is still turned into file:///home/tom/web/index.html
, but
URLs of image files, such as http://www.users.com/~tom/tom.jpg
, are unaffected.
Setting up mappings: -mapfrom and -mapto |