The Compcert verified compiler

Web site / Commercial version / Github development

Compcert is a moderately optimizing compiler from most of the ISO C 1999 language to PowerPC, ARM and x86 assembly language. It comes accompanied with a Coq proof of semantic preservation, ensuring that the generated assembly code behaves as prescribed by the source code. The whole Compcert development is available under a non-commercial licence. Commercial versions are available from AbsInt.

OCaml

Web site / Github development

The Caml Language

I am the proud architect and main developer of the OCaml system, an implementation of the Caml dialect of ML that supports equally well functional, imperative and object-oriented programming, all brought together by an ML-style static type system with polymorphism and type inference. OCaml's object system, due to Didier Rémy and Jérôme Vouillon, is the first to offer ML-style type inference within a full class-based OO language. Other notable features include a powerful SML-style module calculus, a bytecode compiler that generates portable intermediate code that runs very efficiently, and a native-code compiler that generates high-performance assembly code for a variety of platforms.

Various Caml libraries and tools that I developed as side projects are collected below.

Caml Light

More info / Github archive

JoeCaml

Caml Light is a predecessor to OCaml. It is a simple bytecoded implementation of the core Caml language. Caml Light was made obsolete by OCaml, but remained in use for teaching during many years, especially in French preparatory classes (classes préparatoires scientifiques). It is now officially abandoned.

LinuxThreads

Penguin

I'm also a happy user of Linux. My contribution to the Linux effort is the first version of LinuxThreads, a multithreading library for Linux implementing the Posix 1003.1c standard for threads. This library was the standard thread library in Linux distributions between 1998 and 2004. Since then, it has been completely reimplemented by Ulrich Drepper and Ingo Molnar at Redhat. Their implementation is much cleaner and more efficient than mine.

Java Card technologies

Smart card

I have also worked on smart cards, more specifically on compact, efficient and secure Java execution environments for Java Card (Java-enabled smartcards). I started this work as a consultant for Bull CP8, then joined the Trusted Logic start-up company between 1999 and 2004. The most visible outcome of this work is the world's first on-card bytecode verifier for JavaCard, marketed by Axalto under the name Codeshield.

SpamOracle: detection of spam by statistical analysis of e-mail contents

Available on Github

SpamOracle, a.k.a. "Saint Peter", is a tool to help detect and filter away "spam" (unsolicited commercial e-mail). It proceeds by statistical analysis of the words that appear in the e-mail, comparing the frequencies of words with those found in a user-provided corpus of known spam and known legitimate e-mail. The classification algorithm is based on Bayes' formula, and is described in Paul Graham's paper, A plan for spam.

This program is designed to work in conjunction with procmail. The result of the analysis is output as an additional message header X-Spam:, followed by yes, no or unknown, plus additional details. A procmail rule can then test this X-Spam: header and deliver the e-mail to the appropriate mailbox.

Cryptokit: cryptographic primitives for OCaml

Available on Github

The Cryptokit library for OCaml provides a variety of cryptographic primitives that can be used to implement cryptographic protocols in security-sensitive applications. The primitives provided include:

  • Symmetric-key cryptography: AES, DES, Triple-DES, ARCfour, in ECB, CBC, CFB and OFB modes.
  • Public-key cryptography: RSA encryption and signature; Diffie-Hellman key agreement.
  • Hash functions and MACs: SHA-1, SHA-256, RIPEMD-160, MD5, and MACs based on AES and DES.
  • Random number generation.
  • Encodings and compression: base 64, hexadecimal, Zlib compression.

Additional ciphers and hashes can easily be used in conjunction with the library. In particular, basic mechanisms such as chaining modes, output buffering, and padding are provided by generic classes that can easily be composed with user-provided ciphers. More generally, the library promotes a "Lego"-like style of constructing and composing transformations over character streams.

CamlIDL: automatic generation of Caml/C interface stubs

Available on Github / More info

CamlIDL is a stub code generator and COM binding for OCaml. It automates the generation of C stub code required for the Caml/C interface, based on an MIDL specification. (MIDL stands for Microsoft's Interface Description Language; it looks like C header files with some extra annotations, plus a notion of object interfaces that look like C++ classes without inheritance.) In addition, it can be used to interface Caml code with software components that follow Microsoft's now semi-defunct COM interface.

OCamlAgrep: string searching with errors

Available on Github

This OCaml library implements the Wu-Manber algorithm for string searching with errors, popularized by the "agrep" Unix command and the "glimpse" file indexing tool. It was developed as part of a search engine for a largish MP3 collection; the "with error" searching comes handy for those who can't spell Liszt or Shostakovitch.

Given a search pattern and a string, this algorithm determines whether the string contains a substring that matches the pattern up to a parameterizable number N of "errors". An "error" is either a substitution (replace a character of the string with another character), a deletion (remove a character) or an insertion (add a character to the string). In more scientific terms, the number of errors is the Levenshtein edit distance between the pattern and the matched substring. The search patterns are roughly those of the Unix shell, including one-character wildcard (?), character classes ([0-9]) and multi-character wildcard (*). In addition, conjunction (&) and alternative (|) are supported. General regular expressions are not supported, however.

CamlJava: an OCaml/Java interface

Available on Github

This is a very preliminary release of CamlJava, an OCaml/Java interface based on the following schema:

         Caml/C interface       JNI (Java Native Interface)
  Caml <------------------> C <-----------------------------> Java

CamlJava provides a low-level, weakly-typed OCaml interface very similar to the JNI. Java object references are mapped to an abstract type, and various JNI-like operations are provided to allow Java method invocation, field access, and more. A basic callback facility (allowing Java code to invoke methods on Caml objects) is also provided, although some stub Java code must be written by hand.

This library has been tested against Sun's JDK under Linux and Windows. On other platforms, some Makefile tweaking is probably necessary.

For higher-level, type-safe interfaces between Caml and Java, see O'Jacare and also OCaml-Java.

CamlZip: reading and writing ZIP, JAR and GZIP files

Available on Github

This OCaml library provides easy access to compressed files in ZIP and GZIP format, as well as to Java JAR files. It provides functions for reading from and writing to compressed files in these formats.

OCamlMPI: interface with the MPI message-passing interface

Available on Github

OCamlMPI provides Caml bindings for a large subset of MPI functions. MPI is a popular library for distributed-memory parallel programming in SPMD (single program, multiple data) style. MPI offers both point-to-point message passing and group communication operations (broadcast, scatter/gather, etc). Several implementations of MPI are available, both for networks of Unix workstations and for supercomputers with specialized communication networks. More info on MPI is available here.

OCaml-callcc: call/cc for OCaml

Download release 1.0

This library implements the call/cc (call-with-current-continuation) control operator for OCaml. This is a very naive implementation: it works only in bytecode, and performance is terrible (call/cc copies the whole stack). It is intended for educational and experimental purposes. Use in production code is not advised. See also delimited continuations in OCaml by O. Kiselyov.

PGF driver for GNU pic

Download patch against groff 1.19.2.

GNU PIC (gpic) is a text preprocessor that translates high-level geometric descriptions of pictures to Troff and LaTeX low-level graphics primitives. PGF (the Portable Graphics Format) is a LaTeX macro package for generating graphics that are compatible with several TeX back-ends. In particular, PGF is used by the excellent Beamer LaTeX macro package to compose good-looking presentations. As an avid user of both GNU PIC and Beamer, I was frustrated that I was not able to use both at the same time. This patch against the GNU Groff distribution adds a new back-end to GNU PIC that generates PGF graphics primitives. Usage:

          gpic -p < inputfile > outputfile
There is no documentation and no support. It works for me; maybe it will work for you as well.