How to Use HEVEA with the Thai Character Set

Andrew Seagar and นิตยา ซีการ์
email: dr_andrew_seagar@ieee.org

1  Latin/Thai Character Set

Thai LATEX is written in the TIS-620 character encoding. Some people call this ISO-8859-11, but that name was (for a long time) never officially recognised.

The TIS-620 character encoding is an 8-bit single byte character set. It encodes both the ASCII Latin characters (0-127) and the Thai characters (128-255). See, for the official Thai definition, the docuemnt:
“ISO 8859-11 Latin/Thai Character Set standard”
at the website:
www.nectec.or.th/it-standards/iso8859-11/

Non-Thai variations to the official Thai character set were introduced by some vendors. The Windows Thai character set (874) places an unofficial ‘smart quote’ character into one of the empty (illegal) slots in the official Thai set. The DEC (Digital Equipement Coorporation) character set places an unofficial ‘no-break space’ character into another of the empty (illegal) slots in the original official Thai set. It is not too clear what is now “official” and what is not. It is necessary to be a little bit careful. Importing “Thai” docuemnts from Windows into a Linux environment via (for example) Openoffice doesn’t always produce a faithful copy of the original text.

Figure 1 shows the Thai characters according to the Unicode Standard (version 3.0).

2  Thai in LATEX

For Thai in LATEX the package ‘thai’ (file: thai.sty) is used, i.e. \usepackage{thai}.

The source is run through a preprocessor (cttex) to encapsulate all Thai text within bracketted pairs {\thai ....} and to insert the thai-break ‘\tb’ separator.

Normally Thai text is written in a continuous stream with few (if any) blank (space) characters. The preprocessor inserts the ‘\tb’ command to indicate places where the text may be broken if near the end of a line. If these separators are not inserted LATEX has a great deal of trouble in getting a flush right margin without leaving huge gaps in the text.

The style file ‘thai.sty’ contains the definitions for {\thai ....} and \tb. The {\thai ....} command is used to switch the LATEX font.

After passing through the preprocessor, the file is compiled by LATEX in the normal fashion.

3  Thai in HEVEA

For HEVEA the style (package) file ‘thai.sty’ is not used. HEVEA does not recognise the {\thai ....} or \tb constructs. If these constructs are encountered, warnings will be issued and the constructs will be ignored.

In order to use the Thai language with HEVEA, the preprocessor which is normally used before invoking LATEX should not be used. The original (as typed) Thai LATEX file should be passed directly to HEVEA. The command \usepackage{thai} in the file is detected by HEVEA and is used to establish a Thai character encoding. (It is no longer necessary to use the command line flag –charset=TIS-620. This flag is no longer operational).

The commands required to process this file for both Thai LATEX and Thai HEVEA are listed in table 1. The original LATEX filename is assumed to be ‘thaihevea.ttex’ (ttex = Thai tex).


for LATEX 
cttex < thaihevea.ttex > thaihevea.texrun preprocessor
latex thaihevea.texcompile using LATEX
dvips thaihevea.dvi -oconvert using dvips
gv thaihevea.psview using ghostview
for HEVEA 
cp thaihevea.ttex thaihevea.tex‘rename’ file for benefit of HEVEA
hevea thaihevea.texcompile using HEVEA
imagen thaiheveaconvert image to bitmap
firefox thaihevea.htmlview using web browser
Table 1: Processing Thai text with LATEX and HEVEA.

Since the Thai text is not processed to indicate where the text may be broken, the decision is left to the application displaying the html code. The browser I currently use (Firefox) doesn’t know how to break continuous Thai text in suitable places without external help. However the screen width is larger than a page width, which means that on average there are more natural breaks in any line, and the browser is left justifying the text so it doesn’t make large ugly gaps. The right margin is ragged, not flush, but that looks acceptable (to me).

Following is a paragraph of Thai text. It doesn’t say anything important, it is simply here to serve as a basic test. Even if you can’t compile this with LATEX (e.g. you don’t have the file thai.sty or a Thai character set for printing), you can still compile it with HEVEA and make an English/Thai web page.

If you want to eliminate the Thai so you can compile an English-only version of this document, simply insert a comment % character before the \thaistuff command at the top of the file and uncomment the second version of the command (which eliminates the Thai) on the adjacent line.

ศึกษาความหมาย ความสำคัญของสิ่งแวดล้อมศึกษา วิธีการเผยแพร่ประชาสัมพันธ์ ความรู้ทางสิ่งแวดล้อม วิธีการเขียนแผนงานเพื่อเผยแพร่ความรู้ทางสิ่งแวดล้อม นำ สิ่งแวดล้อมศึกษาไปประยุกต์ใช้ในการพัฒนาและเผยแพร่ความรู้ข้อมูล ข่าวสารต่างๆ ในโครงการอื่นๆ ที่มีความสัมพันธ์เกี่ยวข้อง


Figure 1: Thai Character Set


This document was translated from LATEX by HEVEA.