Generating html constructs

8 Generating html constructs

H^EV^EA output language being html, it is normal for users to insert hypertext constructs their documents, or to control colours.

8.1 High-Level Commands

H^EV^EA provides high-level commands for generating hypertext constructs. Users are advised to use these commands in the first place, because it is easy to write incorrect html and that writing html directly may interfere in nasty ways with H^EV^EA internals.

8.1.1 Commands for Hyperlinks

A few commands for hyperlink management and included images are provided, all these commands have appropriate equivalents defined by the hevea package (see section 5.2). Hence, a document that relies on these high-level commands still can be typeset by L^AT_EX, provided it loads the hevea package.

Macro	H^EV^EA	L^AT_EX

`\ahref{`url`}{`text`}`	make text an hyperlink to url	echo text

`\footahref{`url`}{`text`}`	make text an hyperlink to url	make url a footnote to text, url is shown in typewriter font

`\ahrefurl{`url`}`	make url an hyperlink to url.	typeset url in typewriter font

`\ahrefloc{`label`}{`text`}`	make text an hyperlink to label inside the document	echo text

`\aname{`label`}{`text`}`	make text an hyperlink target with label label	echo text

`\mailto{`address`}`	make address a “mailto” link to address	typeset address in typewriter font

`\imgsrc[`attr`]{`url`}`	insert url as an image, attr are attributes in the html sense	do nothing

`\home{`text`}`	produce a home-dir url both for output and links, output aspect is: “~text”

It is important to notice that all arguments are processed. For instance, to insert a link to my home page, (http://pauillac.inria.fr/~maranget/index.html), you should do something like this:

\ahref{http://pauillac.inria.fr/\home{maranget}/index.html}{his home page}

Given the frequency of ~, # etc. in urls, this is annoying. Moreover, the immediate solution, using \verb, \ahref{\verb" ... /~maranget/..."}{his home page} does not work, since L^AT_EX forbids verbatim formatting inside command arguments.

Fortunately, the url package provides a very convenient \url command that acts like \verb and can appear in other command arguments (unfortunately, this is not the full story, see section B.17.11). Hence, provided the url package is loaded, a more convenient reformulation of the example above is:

\ahref{\url{http://pauillac.inria.fr/~maranget/index.html}}{his home page}

Or even better:

\urldef{\lucpage}{\url}{http://pauillac.inria.fr/~maranget/index.html}
\ahref{\lucpage}{his home page}

It may seem complicated, but this is a safe way to have a document processed both by L^AT_EX and H^EV^EA. Drawing a line between url typesetting and hyperlinks is correct, because users may sometime want urls to be processed and some other times not. Moreover, H^EV^EA (optionally) depends on only one third party package: url, which is as correct as it can be and well-written.

In case the \url command is undefined at the time \begin{document} is processed, the commands \url, \oneurl and \footurl are defined as synonymous for \ahref, \ahrefurl and \footahref, thereby ensuring some compatibility with older versions of H^EV^EA. Note that this usage of \url is deprecated.

8.1.2 html style colours

Specifying colours both for L^AT_EX and H^EV^EA should be done using the color package (see section B.14.2). However,one can also specify text color using special type style declarations. The hevea.sty style file define no equivalent for these declarations, which therefore are for H^EV^EA consumption only.

Those declarations follow html conventions for colours. There are sixteen predefined colours:

\black, \silver, \gray, \white, \maroon, \red, \fuchsia, \purple, \green, \lime, \olive, \yellow, \navy, \blue, \teal, \aqua

Additionally, the current text color can be changed by the declaration \htmlcolor{number}, where number is a six digit hexadecimal number specifying a color in the RGB space. For instance, the declaration \htmlcolor{404040} changes font color to dark gray,

8.2 More on included images

The \imgsrc command becomes handy when one has images both in Postscript and GIF (or PNG or JPG) format. As explained in section 6.3, Postscript images can be included in L^AT_EX documents by using the \epsfbox command from the epsf package. For instance, if screenshot.ps is an encapsulated Postscript file, then a doc.tex document can include it by:

\epsfbox{screenshot.ps}

We may very well also have a GIF version of the screenshot image (or be able to produce one easily using image converting tools), let us store it in a screenshot.ps.gif file. Then, for H^EV^EA to include a link to the GIF image in its output, it suffices to define the \epsfbox command in the macro.hva file as follows:

\newcommand{\epsfbox}[1]{\imgsrc{#1.gif}}

Then H^EV^EA has to be run as:

# hevea macros.hva doc.tex

Since it has its own definition of \epsfbox, H^EV^EA will silently include a link the GIF image and not to the Postscript image.

If another naming scheme for image files is preferred, there are alternatives. For instance, assume that Postscript files are of the kind name.ps, while GIF files are of the kind name.gif. Then, images can be included using \includeimage{name}, where \includeimage is a specific user-defined command:

\newcommand{\includeimage}[1]{\ifhevea\imgsrc{#1.gif}\else\epsfbox{#1.ps}\fi}

Note that this method uses the hevea boolean register (see section 5.2.3). If one does not wish to load the hevea.sty file, one can adopt the slightly more verbose definition:

\newcommand{\includeimage}[1]{%
%HEVEA\imgsrc{#1.gif}%
%BEGIN LATEX
\epsfbox{#1.ps}
%END LATEX
}

When the Postscript file has been produced by translating a bitmap file, this simple method of making a bitmap image and using the \imgsrc command is the most adequate. It should be preferred over using the more automated image file mechanism (see section 6), which will translate the image back from Postscript to bitmap format and will thus degrade it.

8.3 Internal macros

In this section a few of H^EV^EA internal macros are described. Internal macros occur at the final expansion stage of H^EV^EA and invoke Objective Caml code.

Normally, user source code should not use them, since their behaviour may change from one version of H^EV^EA to another and because using them incorrectly easily crashes H^EV^EA. However:

Internal macros are almost mandatory for writing supplementary base style files.
Casual usage is a convenient (but dangerous) way to finely control output (cf. the examples in the next section).
Knowing a little about internal macros helps in understanding how H^EV^EA works.

The general principle of H^EV^EA is that L^AT_EX environments \begin{env}… \end{env} get translated into html block-level elements <block attributes>… </block>. More specifically, such block level elements are opened by the internal macro \@open and closed by the internal macro \@close. As a special case, L^AT_EX groups {… } get translated into html groups, which are shadow block-level elements with neither opening nor closing tag.

In the following few paragraphs, we sketch the interaction of \@open…\@close with paragraphs. Doing so, we intend to warn users about the complexity of the task of producing correct html, and to encourage them to use internal macros, which, most of the time, take nasty details into account.

Paragraphs are rendered by p elements, which are opened and closed automatically. More specifically, a first p is opened after \begin{document}, then paragraph breaks close the active p and open a new one. The final \end{document} closes the last p. In any occasion, paragraphs consisting only of space characters are discarded silently.

Following html “normative reference [HTML-5a]”, block-level elements cannot occur inside p; more precisely, block-level opening tags implicitly close any active p. As a consequence, H^EV^EA closes the active p element when it processes \@open and opens a new p when it processes the matching \@close. Generally, no p element is opened by default inside block-level elements, that is, H^EV^EA does not immediately open p after having processed \@open. However, if a paragraph break occurs later, then a new p element is opened, and will be closed automatically when the current block is closed. Thus, the first “paragraph” inside block-level elements that include several paragraphs is not a p element. That alone probably prevents the consistent styling of paragraphs with style sheets.

Groups behave differently, opening or closing them does not close nor open p elements. However, processing paragraph breaks inside groups involves temporarily closing all groups up to the nearest enclosing p, closing it, opening a new p and finally re-opening all groups. Opening a block-level element inside a group, similarly involves closing the active p and opening a new p when the matching \@close is processed.

Finally, display mode (as introduced by $$) is also complicated. Displays basically are table elements with one row (tr), and H^EV^EA manages to introduce table cells (td) where appropriate. Processing \@open inside a display means closing the current cell, starting a new cell, opening the specified block, and then immediately opening a new display. Processing the matching \@close closes the internal display, then the specified block, then the cell and finally opens a new cell. In many occasions (in particular for groups), either cell break or the internal display may get cancelled.

It is important to notice that primitive arguments are processed (except for the \@print primitive, and for some of the basic style primitives). Thus, some characters cannot be given directly (e.g. # and % must be given as \# and \%).

\@print{text}: Echo text verbatim. As a consequence use only ascii in text.
\@getprint{text}: Process text using a special output mode that strips off html tags. This macro is the one to use for processed attributes of html tags.
\@hr[attr]{width}{height}: Output an html horizontal rule, attr is attributes given directly (e.g. SIZE=3 HOSHADE), while width and height are length arguments given in the L^AT_EX style (e.g. 2pt or .5\linewidth).
\@print@u{n}: Output the (Unicode) character “n”, which can be given either as a decimal number or an hexadecimal number prefixed by “X”.
\@open{block}{attributes}: Open html block-level element block with attributes attributes. The block name block must be lowercase. As a special case block may be the empty string, then a html group is opened.
\@close{block}: Close html block-level element block. Note that \@open and \@close must be properly balanced.
\@out@par{arg}: If occurring inside a p element, that is if a <p> opening tag is active, \@out@par first closes it (by emitting </p>), then formats arg, and then re-open a p element. Otherwise \@out@par simply formats arg. This command is adequate when formatting arg produces block-level elements.

Text-level elements are managed differently. They are not seen as blocks that must be closed explicitly. Instead they follow a “declaration” style, similar to the one of L^AT_EX “text-style declarations” — namely, \itshape, \em etc. Block-level elements (and html groups) delimit the effect of such declarations.

\@span{attr}: Declare the text-level element span (with given attributes) as active. The text-level element span will get opened as soon as necessary and closed automatically, when the enclosing block-level elements get closed. Enclosed block-level elements are treated properly by closing span before them, and re-opening span (with given attributes) inside them. The following text-level constructs exhibit similar behaviour with respect to block-level elements.
\@style{shape}: Declare the text shape shape (which must be lowercase) as active. Text shapes are known as font style elements (i, tt, etc.; warning:most of font style elements are depreciated in html5, and some of them are no longer valid, prefer CSS in span tags) or phrase elements (em, etc.) in the html terminology.
\@styleattr{name}{attr}: This command generalises both \@span and \@style, as both a text-level element name name and attributes are specified. More specifically, \@span{attr} can be seen as a shorthand for \@styleattr{span}{attr}; while \@style{name} can be seen as a shorthand for \@styleattr{name}{}.
\@fontsize{int}: Declare the text-level element span with attribute style="font-size:font-size" as active. The argument int must be a small integer in the range 1,2, … , 7. hevea computes font-size, a CSS fontsize value, from int. More specifically, font-size will range from x-small to 120% included in a xx-large, 3 being the default size medium. Notice that \@fontsize is deprecated in favour of \@span with proper fontsize declarations: \@span{style="font-size=xx-small"}, \@span{style="font-size=x-small"}, \@span{style="font-size=small"}, etc.
\@fontcolor{color}: Declare the text-level element span with attribute "style=color" as active. The argument color must be a color attribute value in the html style. That is either one of the sixteen conventional colours black, silver etc, or a RGB hexadecimal color specification of the form #XXXXXX. Note that the argument color is processed, as a consequence numerical color arguments should be given as \#XXXXXX.
\@nostyle: Close active text-level declarations and ignore further text-level declarations. The effect stops when the enclosing block-level element is closed.
\@clearstyle: Simply close active text-level declarations.

Notice on font styling with CSS

The preferred way to style text in new versions of the html “standard” is using style-sheet specifications. Those can be given as argument to a “style” attributes of html elements, most noticeably of the span elements. For instance, to get italics in old versions of html one used the text-level “i” element as in <i>…</i>. Now, for the same results of getting italics one may write: <span style="font-style:italic">…</span>. An indeed hevea styles text in that manner, starting from version 2.00. Such (verbose) declarations are then abstracted into style class declarations by H^EV^EA optimiser esponja, which is invoked by hevea when given option “-O”.

Notice that style attributes can be given to elements other than span. However, combining style attributes requires a little care as only one style attribute is allowed. Namely <cite style="font-weight:bold" style="color:red"> is illegal and should be written <cite style="font-weight:bold;color:red">. For instance: Das Kapital.

The command \@addtyle can be handy for adding style to already style elements:

\@addstyle{name:val}{attrs}: Echo the space-separated attributes attrs of a tag with the name:val style declaration added to these attributes. The style attribute is added if necessary. Examples: \@addstyle{color:red}{href="#"} will produce href="#" style="color:red", and \@addstyle{color:red}{href="#" style="font-style:italic"} will produce href="#" style="font-style:italic;color:red". Note that an unnecessary extra space can be added in some cases.

As an example, consider the following definition of a command for typesetting citation in bold, written directly in html:

\newcommand{\styledcite}[2][]
{{\@styleattr{cite}{\@addstyle{#1}{style="font-weight:bold"}}#2}}

The purpose of the optional argument is to add style to specific citations, as in:

Two fundamental works: \styledcite{The Holy Bible} and
\styledcite[color:red]{Das Kapital}.

We get: Two fundamental works: The Holy Bible and Das Kapital.

Notice that the example is given for illustrating the usage of the \@addstyle macros, which is intended for package writers. A probably simpler way to proceed would be to use L^AT_EX text-style declarations:

\newcommand{styledcite}[2][]{{\@style{cite}#1\bf{}#2}}
Two fundamental works: \styledcite{The Holy Bible} and
\styledcite[\color{red}]{Das Kapital}.

We get: Two fundamental works: The Holy Bible and Das Kapital.

8.4 The rawhtml environment

Any text enclosed between \begin{rawhtml} and \end{rawhtml} is echoed verbatim into the html output file. Similarly, \rawhtmlinput{file} echoes the contents of file file. In fact, rawhtml is the environment counterpart of the \@print command, but experience showed it to be much more error prone.

When H^EV^EA was less sophisticated then it is now, rawhtml was quite convenient. But, as time went by, numerous pitfalls around rawhtml showed up. Here are a few:

Verbatim means that no translation of any kind is performed. In particular, be aware that input encoding (see B.17.4) does not apply. Hence one should use ascii only, if needed non-ascii characters can be given as entity or numerical character references — e.g. é or &#XE9; for é.
The rawhtml environment should contain only html text that makes sense alone. For instance, writing \begin{rawhtml}<table>\end{rawhtml}… \begin{rawhtml}</table>\end{rawhtml} is dangerous, because H^EV^EA is not informed about opening and closing the block-level element table. In that case, one should use the internal macros \@open and \@close.
\begin{rawhtml}text\end{rawhtml} fragments that contain block-level elements will almost certainly mix poorly with p elements (introduced by paragraph breaks) and with active style declaration (introduced by, for instance, \it). Safe usage will most of the time means using the internal macros \@nostyle and \@out@par.
When H^EV^EA is given the command-line option -O, checking and optimisation of text-level elements in the whole document takes place. As a consequence, incorrect html introduced by using the rawhtml environment may be detected at a later stage, but this is far from being certain.

As a conclusion, do not use the rawhtml environment! A much safer option is to use the htmlonly environment and to write L^AT_EX code. For instance, in place of writing:

\begin{rawhtml}
A list of links:
<ul>
<li><a href="http://www.apple.com/">Apple</a>.
<li><a href="http://www.sun.com/">Sun</a>.
</ul>
\end{rawhtml}

One can write:

\begin{htmlonly}
A list of links:
\begin{itemize}
\item \ahref{http://www.apple.com/}{Apple}.
\item \ahref{http://www.sun.com/}{Sun}.
\end{itemize}
\end{htmlonly}

A list of links:

Apple.
Sun.

If H^EV^EA is targeted to text or info files (see Section 11). The text inside rawhtml environments is ignored. However there exists a rawtext environment (and a \rawtextinput command) to echo text verbatim in text or info output mode. Additionally, the raw environment and a \rawinput command echo their contents verbatim, regardless of H^EV^EA output mode. Of course, when H^EV^EA produces html, the latter environment and command suffer from the same drawbacks as rawhtml.

8.5 Examples

As a first example of using internal macros, consider the following excerpt from the hevea.hva file that defines the center environment:

\newenvironment{center}{\@open{div}{style="text-align:center"}}{\@close{div}}

Notice that the code above is no longer present and is given here for explanatory purpose only. Now H^EV^EA uses style-sheets and the actual definition of the center environment is as follows:

\newstyle{.center}{text-align:center;margin-left:auto;margin-right:auto;}%
\setenvclass{center}{center}%
\newenvironment{center}
  {\@open{div}{\@getprint{class="\getenvclass{center}"}}
  {\@close{div}}%

Basically environments \begin{center}…\end{center} will, by default, be translated into blocks <div class="center">…</div>. Additionally, the style class associated to center environments is managed through an indirection, using the commands \setenvclass and \getenvclass. See section 9.3 for more explanations.

Another example is the definition of the \purple color declaration (see section 8.1.2):

\newcommand{\purple}{\@fontcolor{purple}}

H^EV^EA does not feature all text-level elements by default. However one can easily use them with internal macros. For instance this is how you can make all emphasised text blink:

\renewcommand{\em}{\@styleattr{em}{style="text-decoration:blink"}}

Here is an example of this questionable blinking feature:

Hello!

Then, here is the definition of a simplified \imgsrc command (see section 8.1.1), without its optional argument:

\newcommand{\imgsrc}[1]
  {\@print{<img src="}\@getprint{#1}\@print{">}}

Here, \@print and \@getprint are used to output html text, depending upon whether this text requires processing or not. Note that \@open{img}{src="#1"} is not correct, because the element img consists in a single tag, without a closing tag.

Another interesting example is the definition of the command \@doaelement, which H^EV^EA uses internally to output A elements.

\newcommand{\@doaelement}[2]
  {{\@nostyle\@print{<a }\@getprint{#1}\@print{>}}{#2}{\@nostyle\@print{</a>}}

The command \@doaelement takes two arguments: the first argument contains the opening tag attributes; while the second element is the textual content of the A element. By contrast with the \imgsrc example above, tags are emitted inside groups where styles are cancelled by using the \@nostyle declaration. Such a complication is needed, so as to avoid breaking proper nesting of text-level elements.

Here is another example of direct block opening. The bgcolor environment from the color package locally changes background color (see section B.14.2.1). This environment is defined as follows:

\newenvironment{bgcolor}[2][style="padding:1em"]
{\@open{table}{}\@open{tr}{}%
\@open{td}{\@addstyle{background-color:\@getcolor{#2}}{#1}}}
{\@close{td}\@close{tr}\@close{table}}

The bgcolor environment operates by opening a html table (table) with only one row (tr) and cell (td) in its opening command, and closing all these elements in its closing command. In my opinion, such a style of opening block-level elements in environment opening commands and closing them in environment closing commands is good style. The one cell background color is forced with a background-color property in a style attribute. Note that the mandatory argument to \begin{bgcolor} is the background color expressed as a high-level color, which therefore needs to be translated into a low-level color by using the \@getcolor internal macro from the color package. Additionally, \begin{bgcolor} takes html attributes as an optional argument. These attributes are the ones of the table element.

If you wish to output a given Unicode character whose value you know, the recommended technique is to define an ad-hoc command that simply call the \@print@u command. For instance, “blackboard sigma” is Unicode U+02140 (hexa). Hence you can define the command \bbsigma as follows:

\newcommand{\bbsigma}{\@print@u{X2140}}

Then, “\bbsigma” will output “⅀”

8.6 The document charset

According to standards, as far as I understand them, html pages are made of Unicode (ISO 10646) characters. By contrast, a file in any operating system is usually considered as being made of bytes.

To account for that fact, html pages usually specify a document charset that defines a translation from a flow of bytes to a flow of characters. For instance, the byte 0xA4 means Unicode 0x00A4 (¤) in the ISO-8859-1 (or latin1) encoding, and 0x20AC (€) in the ISO-8859-15 (or latin9) encoding. Notice that H^EV^EA has no difficulty to output both symbols, in fact they are defined as Unicode characters:

\newcommand{\textcurrency}{\@print@u{XA4}}
\newcommand{\texteuro}{\@print@u{X20AC}}

But the \@print@u command may output the specified character as a byte, when possible, by the means of the output translator. If not possible, \@print@u outputs a numerical character references (for instance &#X20AC;).

Of course, the document charset and the output translator must be synchronised. The command \@def@charset takes a charset name as argument and performs the operation of specifying the document character set and the output translator. It should occur in the document preamble. Valid charset names are ISO-8859-n where n is a number in 1…15, KOI8-R, US-ASCII (the default), windows-n where n is 1250, 1251, 1252 or 1257, or macintosh, or UTF-8. In case those charsets do not suffice, you may ask the author for other document charsets. Notice however that document charset is not that important, the default US-ASCII works everywhere! Input encoding of source files is another, although related, issue — see Section B.17.4.

If wished so, the charset can be extracted from the current locale environment, provided this yields a valid (to H^EV^EA) charset name. This operation is performed by a companion script: xxcharset.exe. It thus suffices to launch H^EV^EA as:

# hevea -exec xxcharset.exe other arguments