There are several ways in which an applet could a priori present a danger for the client security. In this chapter, we first give a sketch of these dangers, and then describe the solutions adopted for Caml Applets. Each solution is based on solid results, either research results in language design, or in cryptography. The description of a solution starts with a general presentation, accessible to the non-specialist, and continues with more precise details.
First, we must describe what we consider a security hazard for the client:
A program manipulates data. Data has a certain shape, called its type, such as integer, string, record, function, etc... Informally, a program is said to be well-typed when it always manipulate data consistently with its type. For example, a well-typed program would never treat a data structure as a function and attempt to execute it. Or it would never consider the same data sometimes as an integer, and sometimes as an array. A well-typed program never goes wrong, in the sense that it never produce errors due too inconsistent data manipulation.
Some languages (Lisp, Scheme) are dynamically typed, that is, each low-level operation is checked before being executed. An invalid access to memory will raise an error, but at least it will not go unnoticed or crash the program. Other languages are statically typed: types of objects are computed at compile-time, either from declarations in program source (Pascal, C), or using type inference (ML). Among these languages, some are only weakly typed, such as C. C allows the user to force a type coercion in a program, even it this may cause type error at runtime.
According to our terminology, a dynamically typed, interpreted or compiled language does not make differences between well-typed and other programs. The amount of runtime typechecking will decide if the program can make inconsistent data manipulations.
A C compiler will accept programs that are not well-typed.
On the contrary, languages of the ML are said to be strongly typed: the compiler accepts only well-typed programs.
Caml Special Light has a classical ML type system. There are plenty of information sources on ML typing (), so we only mention some relevant aspects of typing.
An ML program can only access values available in its scope. In particular, no computation on pointers is available at the language level. The scope is lexical, meaning that a variable cannot reference at run-time something different than what was determined at compile-time.
Informally, scoping and memory access are relevant to security because they allow strict control on the values that can be accessed and manipulated by a given program, as well as on the functions the program may call.
Types of values can be abstract, meaning that there is no direct access to the value representation other than the exposed manipulation functions. A good example of the relevance of abstraction to security can be found in Unix file descriptors (although these are not available to applets anyway). In C, a file descriptor is simply an integer, so that any part of the program can access any possible file descriptor. In Caml Special Light, file descriptors have an abstract type. If no function creating file descriptors (e.g. open) is made available, a program can only manipulate files descriptors explicitly passed as parameters. This feature does however not give complete control, since the program can keep around abstract values from a previous invocation.
A module system allows the decomposition of a program into modules, that in some cases can be separately compiled. Separate compilation means that, to compile a module, one needs only the interfaces of the imported modules, and not their implementations.
In the case of a typed language, the module system should enforce the coherence of types between different modules. There are very few module systems actually enforcing this coherence. For example, a classical Unix linker doesn't perform any type checking. Different C modules (files actually) can export and import a given global variable with different types.
The module system of Caml Special Light belongs the family of Standard ML module systems. It fullfills our requirements, that is separate compilation, link-time checking of module coherence, and also dynamic linking. More information is found in the litterature, and on the Web.
Here are the few key elements of link-time verifications:
Dynamic linking is similar to static linking, except that the set of exported modules is explicitly controled by the application.
So, in our case, applets are implemented in modules. They are compiled separately, by the applet developper, and may import modules whose interfaces are given in the applet development kit. These interfaces correspond to safe libraries exported by the browser, and are described in chapter A.
After retrieval of the distant bytecode, the browser will, through the dynamic link library of Caml Special Light, perform the link-time type verifications, ensuring that an applet cannot have access to functions not considered as safe.
The Caml Special Light distribution contains a tool, objinfo, that prints a human-readable version of the type information remaining in the compiled bytecode.
All previous security considerations based on the type and module systems are invalidated if the transferred bytecode is not the unmodified output of an intact compiler. A malicious individual could read the bytecode specs, write or generate bytecode, and add linker information without any relation to the actual bytecode.
The approach we propose to solve this problem is to use bytecode authentication, using PGP for example.
The problem we have is to obtain a guarantee that some bytecode is the intact output of the unmodified Caml Special Light compiler. By signing bytecode with PGP, the problem is ``reduced'' to trusting the author of the signature. Of course, this is not a technical problem but a social problem. Knowing that a given individual has signed the bytecode is completely different from trusting this individual.
However, it is possible to install trusted compiler. A trusted compiler would have its own, protected, signature and would act as a public service for bytecode authentication.
The process of bytecode authentication will then be: