BIOCHAM 2.3 Reference Manual

May 2005
Nathalie Chabrier-Rivier, François Fages, Sylvain Soliman,
Projet Contraintes, INRIA Rocquencourt, France
http://contraintes.inria.fr/BIOCHAM


Differences with BIOCHAM 2.2

Learning of (boolean) rules.

Export to xppaut's .ode format.

LTL model checking on numerical traces, and learning of numerical parameters.

A new, more versatile parser, replaced the old one, thanks to Daniel de Rauglaudre.

More shortcuts were introduced for NuSMV queries. Fairness (weak) or dynamic reordering BDD's variables can be imposed on the system. NuSMV 2.2.2 and 2.2.3 is supported.

The vim mode that was developped externally is now merged in the distribution, an emacs mode is available too.

Seasonal bugfixes ;)

Differences with BIOCHAM 2.1

BIOCHAM 2.2 brings more flexibility for plotting simulation results. The simulation can use (and that's the default) an implicit method, and thus handles stiff equations.

Any biocham command can now appear in a Biocham file.

SBML (resp. BIOCHAM) parameters are now imported (resp. exported) to BIOCHAM (resp. SBML) parameters.

Lots of minor bugs were fixed.

Differences with BIOCHAM 2.0

BIOCHAM 2.1 implements the adaptive step size method of step doubling for Runge-Kutta integration method.

Bug fixed on the parsing of arithmetic expressions (implies to add parentheses around complexes given with stoichiometric coefficients).

Differences with BIOCHAM 1

BIOCHAM 2 introduces the ability to use quantitative models, with stoichiometric coefficients and kinetic laws (examples are available in the EXAMPLES/kinetics directory). These models can be numerically simulated, or used as before for model-checking via a boolean abstraction. Parameters and macros can also be defined, used in rate laws and saved in a separate file.

An import/export function to/from SBML has also been introduced.

Some shortcuts for common CTL queries are now available.

Differences with BIOCHAM 0

BIOCHAM 1 introduces a rich pattern language for defining reaction rules in a concise manner. The complexation operator is now supposed to be associative and commutative. The operators Ei and Ai have been introduced in the query language to quantify over the initial states.

BIOCHAM 1.1 adds the possibility to export the interaction map to Graphviz dot format.


1. Installation

BIOCHAM  is precompiled for x486 machines under Linux and under Windows with Cygwin. BIOCHAM can be installed in any directory. The command biocham starts the BIOCHAM interpreter.

The commands ./configure && make recompile BIOCHAM if necessary. If a recompilation is needed on your machine, you need to install first:

make install can then be used to complete the installation of the program.

See the file INSTALL of the distribution for more details.

BIOCHAM is a free software protected by the GNU General Public License GPL version 2. This is an Open Source license that allows free usage of this software.


2. Syntax of BIOCHAM

Biochemical objects:

BIOCHAM manipulates formal objects which represent chemical or biochemical compounds, ranging from ions, to small molecules, macromolecules and genes. BIOCHAM objects can be used also to represent control variables and abstract processes. They are written with the following syntax:

object = molecule | abstract

name = word of alphanumerical and '_' characters beginning with a letter.

molecule =
name
| molecule::name
| molecule-molecule
| molecule~{name,...,name}
| gene
| ( molecule )

gene = #name

abstract = @name

A BIOCHAM object is either a "molecule" or an abstract object. The syntax of a molecule has five forms. In the first form, the simplest and the most flexible one, a molecule is simply given a name. Lower and upper case letters are distinguished. The words not, in, and, xor, declare, where, A, E, U, AF, EF, AG, EG, AX, EX cannot be used as names as they are logical connectives or operators for temporal queries (see section below).

The second form serves to denote multimolecular complexes with the linking operator -. This binary operator is assumed to be associative and commutative, hence the order of the elements in a complex does not matter. In the cases where one would like to distinguish between different orders of association, one can denote the different complexes with specific names.

The third form serves to write modified forms of molecules, like attaching the set of phosphorylated sites with the operator ~.  Several sets can be attached. The order of the elements is irrelevant.

The fourth form denotes a localised molecule. This allows to consider differently a molecule localised in several compartments, and thus to take into acount transport phenomena.

Example: cdk1, cdk1-cycB, cdk1~{tyr15,thr161}-cycB and cdk1::cytoplasm are valid BIOCHAM notations for, respectively, the cyclin dependent kinase one, the complex cdk1 with cyclin B, the phosphorylated form at sites tyrosine 15 and threonine 161 of cdk1 in the complex cdk1-cycB, and the cycline dependent kinase localised in the cytoplasm. (cdk1-cycB)~{tyr15,thr161} is another notation for the same phosphorylated form of the complex without making precise the constituent which is phosphorylated. Note that in this syntax, the complex is considered as formally different.

The fifth syntactical form is used to denote genes or gene promotors, with a name beginning with #. These objects are assumed to be unique, which has a consequence on the way reactions involving such objects are interpreted by BIOCHAM (see below).

Example: DMP1-#p19ARF can be used to denote the binding of protein DMP1 on the promotor of the gene producing protein p19ARF noted #p19ARF.

The same assumption of uniqueness is made on abstract objects that are noted with a '@'. Abstract objects can be used to represent particular phases of a process, complete subsystems or abstract processes.

Example: @UbPro can be used to denote the Ubiquitine/Proteasome system and write this formal object as a catalyst in degradation reaction rules.


Reaction rules:

BIOCHAM reaction rules are used primarily to represent biochemical reactions. They can be used also to represent state transitions involving control variables or abstract processes, or to represent the main effects of complete subsystems such as protein synthesis by DNA transcription without introducing RNAs in the model. They are written with the following syntax:

reaction =
kinetics for basic_reaction
| basic_reaction
| name : basic_reaction
| name : kinetics for basic_reaction

basic_reaction =
solution => solution.
| solution =[object]=> solution.
| solution =[solution => solution]=> solution.
| solution <=> solution.
| solution <=[object]=> solution.

solution = _ | object | integer*object | solution + solution | ( solution )

kinetics =
simple_kinetics
| (simple_kinetics , simple_kinetics)
| if condition then simple_kinetics
| if condition then simple_kinetics else simple_kinetics
| if condition then simple_kinetics else (kinetics)

simple_kinetics =
[molecule]
| float
| name
| simple_kinetics * simple_kinetics
| simple_kinetics / simple_kinetics
| simple_kinetics + simple_kinetics
| simple_kinetics - simple_kinetics
| simple_kinetics ^ simple_kinetics
| log(simple_kinetics)
| exp(simple_kinetics)
| (simple_kinetics)

condition =
simple_kinetics < simple_kinetics
| simple_kinetics > simple_kinetics
| simple_kinetics = simple_kinetics
| simple_kinetics =< simple_kinetics
| simple_kinetics >= simple_kinetics
| condition and condition

A solution is thus a sum of objects, the character _ denotes the empty solution. The order and multiplicity of molecules in a solution (1 if not explicitly written) will be ignored for all qualitative operations (like model-checking), only the presence or absence of objects will then be considered.

In such a boolean abstraction of stoichiometric models, a reaction transforms one solution matching the left-hand side of the rule, into another solution in which the objects of the right-hand side have been added. The molecules in the left-hand side of the rule which do not appear in the right-hand side may be non-deterministically present or consumed in the resulting solution. This convention reflects the capability of BIOCHAM to reason about all possible behaviors of the system with unknown kinetics parameters [1, 3].

Following the uniqueness assumption, molecules marked as "genes" with the '#' notation, or any compound built on such a molecule (such as DMP1- #p19ARF for instance) are not multiplied. These objects remain unique and are deterministically consumed in the form in which they appear in the left-hand side of the rule. The same goes for control variables, noted with a '@', which are deterministically consumed.

Reactions involving a catalyst molecule (i.e. a molecule appearing in both the left and right hand sides of the rule) are written with the catalyst molecule between square brackets in the arrow. Similarly a catalyst reaction can be written between square brackets in the arrow of the rule.

The kinetic rates will be only used in numerical operations (like simulation), if no rate is provided zero will be assumed. Pairs of kinetic rates are given for reversible reactions. For floats, the exponential notation 1.0e6 or 7.2E-4 is accepted, but the part before the exponent must then contain a dot (.).

When a name is given in a law, a corresponding parameter or macro will be looked for. Conditions, expressed as conjunctions of simple linear (in)equations, can be used to constrain the use of some law.

Example: cdk1 + cycB => cdk1-cycB is a complexation rule.  cdk1-cycB =[Myt1]=> cdk1~thr14-cycB is a phosphorylation rule with catalyst Mytosine 1. This rule is equivalent to cdk1-cycB + Myt1 => Myt1 + cdk1~{thr14}-cycB. (1*[RAF]*[RAFK],0.4*[RAF-RAFK]) for RAF + RAFK <=> RAF-RAFK. is a reversible complexation rule with two different kinetic rates: the product of the concentrations for the complexation, 0.4 times the complex concentration for the decomplexation.

Temporal properties:

The temporal properties of a BIOCHAM model (or of its boolean abstraction if the original model is quantitative) can be queried in Computation Tree Logic (CTL). CTL is an extension of propositional logic with two path quantifiers for non-determinism: E, A, and several operators for time: F, G, X, U. The path quantifier E expresses the existence of a path, A means for all paths, F means at some time point (on the path), G means at all time points, X means at the next time point, U is a binary operator meaning that a formula is true until a second formula becomes true. Furthermore, since it is possible to have an initial state only partially defined (some molecules are present, some others are absent, but some are left unspecified), a BIOCHAM query prepends an operator about initial state to a CTL query. There are two such operators: Ei and Ai, meaning respectively, "there exists an initial state such that..." and "for all initial states it is true that...".

biocham-query = Ei(query) | Ai(query)
query =


object

| (query)

| EF(query)

| AF(query)

| EG(query)

| AG(query)

| E((query)U(query))

| A((query)U(query))

| EX(query)

| AX(query)

| !(query) negation

| query & query conjunction

| query '|' query disjunction

| query xor query exclusive or

| query -> query implication

| query <-> query equivalence

| reachable(query) same as EF(query), i.e. a state where query is true is reachable

| stable(query) same as AG(query), i.e. query is always true

| steady(query) same as EG(query), i.e. query can remain always true

checkpoint(query,query) same as !(E(!(query1) U query2)), i.e. there is no path leading to the second query without making true the first

| loop(query,query) same as AG((query1 -> EF(query2)) & (query2 -> EF(query1))), i.e. it is always true that if query1 is true query2 can become true and if query2 is true query1 can become true, the system thus oscillates between query1 and query2

| oscil(query) same as loop(query,!(query)), i.e. the system oscillates between states where query is true and false

Example: Ei(EF(cycB)) expresses the existence of an initial state such that there exists a path on which at some time point cyclin B is present. Ai(AF(cycB)) expresses that for all initial states and on all paths, cycB is finally present at some time point. Ai(!(E(!(cdc25) U cdk1-cycB))) checks that cdc25 is a checkpoint for the complexation of cdk1 and cycB for all initial states, that is there does not exist a path on which cdc25 is always absent until the complex cdk1-cycB is present.

A query partially describes the state of the system. For instance: cdk1 and MPF and !(cdk7-cycH) represent states of the system where cdk1 and MPF are present and cdk7-cycH is absent, the rest being undetermined.

Patterns

Patterns are used in the commands described in the next section for searching molecules or reaction rules of a particular kind in a model. They can be used also with some restrictions to define the initial state or reaction rules by writing sets of similar reaction rules in a concise manner with a single reaction rule pattern. Finally they are used to specify a range of useful rules during the learning process.

Patterns introduce the special character ? and variables noted with a name beginning with '$', to denote unspecified parts of a molecule. The parts of the molecules matched by ? or a variable can be empty. 

object_pattern = molecule_pattern | abstract

variable = ? | $name

simple_pattern = name | variable

molecule_pattern =
simple_pattern
| molecule_pattern-molecule_pattern
| molecule_pattern~{simple_pattern,...,simple_pattern}
| molecule_pattern~variable
| gene
| ( molecule_pattern )

Example: cdk1~? is a pattern representing cdk1 itself and any phosphorylated form of it. cdk1~{tyr15,?} or equivalently cdk1~{tyr15}~? represents any  form of cdk1 phosphorylated on site tyrosine 15 at least. cdk1-? represents cdk1 itself and any complex containing cdk1. cdk1~?-? is a pattern representing all forms of cdk1, phosphorylated and/or complexed.

Reaction patterns have the same syntax as reaction rules except that object patterns are allowed in place of objects and there is an optional where construct which is used to constrain the variables of the pattern. In a solution pattern, the character ? can match an empty solution.

When used to define the initial state or reaction rules, like in a file or in the command add_rule (see below), variables that appear in a pattern have to be given a range of possibles. This is done through the use of constraints in the rule and of declarations.

reaction_pattern =
kinetics_pattern for basic_reaction_pattern where constraints
| basic_reaction_pattern where constraints
| reaction_shortcut

basic_reaction_pattern =
name : basic_reaction_pattern
| solution_pattern => solution_pattern.
| solution_pattern =[object_pattern]=> solution_pattern.
| solution_pattern =[solution_pattern => solution_pattern]=> solution_pattern.
| solution_pattern <=> solution_pattern.
| solution_pattern <=[object_pattern]=> solution_pattern.

solution_pattern =
variable | object_pattern | integer * object_pattern | solution_pattern + solution_pattern | (solution_pattern)

constraints =
variable in {object_pattern,...,object_pattern}
| variable not in {object_pattern,..., object_pattern}
| variable in all
| variable in all_simple
| variable in parts_of{ name, name,... }
| name not in variable
| variable diff object_pattern
| variable phos_form object_pattern
| variable not phos_form object_pattern
| variable more_phos_than object_pattern
| variable not more_phos_than object_pattern
| variable submol object_pattern
| variable not submol object_pattern
| variable has_simple_mol_in_common object_pattern
| variable has_no_simple_mol_in_common object_pattern

kinetics_pattern = same as 'kinetics' but with 'molecule_pattern' instead of 'molecule'

reaction_shortcut =
complexation ($A + $B => $A-$B where $A in all and $B in all and $A diff $B)
| decomplexation ($A-$B => $A+$B where $A in all and $B in all and $A diff $B)
| de_complexation ({complexation, decomplexation})
| phosphorylation ($A =[$C]=> $B where $A in all and $B in all and $C in all and $B more_phos_than $A and $A diff $B)
| dephosphorylation ($A =[$C]=> $B where $A in all and $B in all and $C in all and $A more_phos_than $B and $B diff $A)
| de_phosphorylation ({phosphorylation, dephosphorylation})
| synthesis (_=[$G]=>$A where $A in all_simple and $G in all and $A diff $G)
| degradation ($A =[$D]=>_ where $A in all and $D in all and $A diff $D)
| elementary_interaction_pattern ({de_complexation, de_phosphorylation, synthesis, degradation})
| decomplexation_phosphorylation ($A-$C => $C+ $B where $A in all and $B in all and $C in all and $A more_phos_than $B and $A diff $B)
| decomplexation_dephosphorylation ($A-$C => $C+ $B where $A in all and $B in all and $C in all and $B more_phos_than $A and $A diff $B)
| complexation_phosphorylation ($A-$C => $C+ $B where $A in all and $B in all and $C in all and $A more_phos_than $B and $A diff $B)
| re_complexation_phosphorylation (same as above but with <=>)
| complexation_dephosphorylation ($A-$C => $C+ $B where $A in all and $B in all and $C in all and $B more_phos_than $A and $A diff $B)
| re_complexation_dephosphorylation (same as above but with <=>)
| other_elementary_interaction_pattern (de_complexation, re_complexation_phosphorylation, re_complexation_dephosphorylation, synthesis, degradation)
| elementary_interaction_pattern_more (other_elementary_interaction_pattern, de_phoshorylation)

The in constraint has its natural meaning. In the constraints the values all and all_simple refer to all the molecules already known by the system (i.e. used before), limited to non-complexed and non-phosphorylated forms in the second case. This means that using these constructs makes the semantics of rule dependant on the context (order of rules, models parts loaded before, etc.), and is thus recommended only to advanced users.

The diff and not in have the opposite meaning (the variable cannot take the value or a value in the set. The second case of not in corresponds to the absence of some phosphorylation site (the name) in a set of phosphorylated sites (the variable).

The phos_form constraint, forces the variable and the object_pattern to differ only by some phosphorylations. more_phos_than forces the variable to be more phosphorylated than the object_pattern.

The submol constraint forces the variable to be a sub-molecule of the pattern, the has_simple_mol_in_common constraint imposing a common sub-molecule.

Note:
When using constraints relating a variable to an object_pattern, variables appearing in that object_pattern have to be constrained beforehand. Moreover, to define a variable range (for instance to add a rule), one has to use for each variable at least one positive constraint: in or sub_mol, and if the adequate declaraion exists, phos_form or more_phos_than.

Example: (cdk1~?-? + ? => ?) is a pattern for all rules reacting with any form (phosphorylated or complexed) of cdk1. (? =[Myt1]=> ?) is a pattern for all rules involving the catalyst Myt1. This pattern will match all the rules having Myt1 in their left and right-hand sides, even if they were not written with the catalyst notation.
This pattern cannot be used to define reaction rules as it can match unspecified (unconstrained) molecules.

Example:
cdk46~$P + $cycD => cdk46~$P-$cycD
where
$cycD in {DMP1~?-cycD~?, cycD~?}.

is a reaction pattern which will match all complexation rules of all phosphorylated forms of cdk46 with all phosphorylated forms of cycD or DMP1-cycD.
This pattern can be used to define reaction rules.  All possible phosphorylated forms of molecules cdk46, DMP1 and cycD have to be declared however (see section below) in order to constrain the variable $P and the different occurrences of ?.
Note that if the three molecules in this pattern had three phosphorylation sites each, they would have 2³=8 forms each, thus the pattern would specify 8x(8x8+8)=576 reaction rules! Reaction patterns must thus be used with care for specifying reaction rules and only the relevant phosphorylation sites should be declared for a molecule.

Example:
cdk1~$P-$cyc =[Wee1]=> cdk1~$P~p2-$cyc
where p2 not in $P
and   $cycA in {cycA, cks1-cycA}
and   $cyc in { $cycA, cycB, cycB-cks1}.
is a reaction pattern which specifies the phosphorylation on site p2 of several cdk1 complexes not already phosphorylated on p2.
This pattern can be used to define reaction. If cdk1 is declared with 3 phosphorylation sites {p1,p2,p3}, there are 4 forms not containing p2 to combine with the 4 possibilities for the variable $cyc. This reaction pattern thus expands into 16 reaction rules.

Declarations

The set of all possible phosphorylated forms of a given molecule can be declared by associating to the molecule the set of sets of sites which can be phosphorylated. In these declarations, the function parts_of can be used to denote all subsets of a set.

Example: the declaration
declare cdk2~{{},{p1},{p2},{p1,p2}}.
specifies that cdk2 has two phosphorylation sites p1, p2 and that all combinations are possible. This declaration is equivalent to
declare cdk2~parts_of({p1,p2}).

The purpose of declarations is to constrain implicitly the domain of phosphorylation variables which appear in molecule patterns. Only phosphorylation sites can be declared. In a reaction pattern used for defining reaction rules, the other variables, such as complexation variables, must thus be explicitly constrained in the where part of the pattern.

Declarations can be entered also at top-level with the command declare, can be listed with the command declarations and can be cleared with the command clear_rules.

Note:
Since the aim of declarations is to define the possible phosphorylation sites, any rule where a molecule appears in a form contradictory with its declaration will be rejected. Since most molecules appear also in non-phosphorylated form, a warning will be given when a declaration does not include the empty set (but you can safely ignore it if you know what you are doing).

Location declarations

The set of all possible localisations for a given molecule can be declare by associating to the molecule the set of its possible locations or its single possible localisation.

Example: The location declarations localise p53::nucleus. and localise p53::[nucleus]. denote the same localised molecule.
localise Mdm2::[cytoplasm,nucleus]. denotes that the Mdm2 molecule can be localised in the cytoplasm or in the nucleus, but in no others specified locations. This molecule could appear in the rules in the localised forms Mdm2::cytoplasm or Mdm2::nucleus or in the non-localised form Mdm2.

To use a localised form of a molecule in a rule, this form must have been previously defined with the command localise. Once the locations of a molecule have been specified, you can add new possible locations with the command extend_localise , witch is used with the same nomenclatur as localise .
All the localised molecules can be listed with the command list_locations and cleared with the command clear_rules.

Volume declarations

When several cell compartments are considered, this compartments may have different volumes. In this very common case, transport rules occure between compartments whose volume differ eatch other and volume ratio must be taken into account in all kinetic models.

Example: The volume ratio between nucleus and cytoplasm (nuclear volume/cytoplasmic volume) is currently above 1/15. It can defined with the command volume_ratio (15,nucleus),(1,cytoplasm). The volume ratio will automaticaly be considered during numerical simulations for every rules recognized as tranport rules, i.e. every reactions whose at least one reactant and one product are localised in different compartements. If no volume ratio are defined, the default ratio is set to unity (that means all the compartments have the same volume) and a warning is signaled.
All the volume ratio can be listed with the command list_volumes.


3. BIOCHAM files

The name of a BIOCHAM file must be suffixed by .bc. BIOCHAM files may contain:

Usually, a BIOCHAM model is defined in a single file containing only declarations and reaction rules or rule patterns. Then different initial states, sets of parameters or test commands can be defined in different files.


4. BIOCHAM commands at top-level

The  command biocham starts the BIOCHAM interpreter. Some BIOCHAM files can be passed as arguments to the command to be loaded immediately. Under the BIOCHAM prompt, the following commands are available, they must be terminated by a dot. Several commands can be executed by separating them with a ','. Some commands take a list as argument. Lists are noted between square brackets, e.g. [a,b,c]. The previous typed commands can be retrieved by pressing ctrl-p. The commands can be automatically completed by pressing on the tabulation key. A command can be interrupted by ctrl-c. To quit the interpreter type quit.

Loading and exporting files:

load_biocham(file).

the reaction rules of the BIOCHAM file are loaded and taken as current set of rules. The initial state is initialized according to the declarations in the file.

Example: load_biocham('EXAMPLES/cell_cycle.bc') or  load_biocham('cell_cycle.bc') or load_biocham(cell_cycle). Note that in the last form, the quotes are not necessary and the suffix is automatically added to the name. The commands contained in the file are executed.

add_biocham(file).

the rules of the given file are loaded and added to the current set of rules. The initial state is updated incrementally. The commands contained in the file are executed.

change_directory(directory).

changes the current directory.

export_biocham(file).

saves the current BIOCHAM set of rules and initial state in a BIOCHAM file.

expand_biocham(file).

saves the current BIOCHAM set of rule instances and initial state in a BIOCHAM file (all reaction rule patterns are expanded).

export_param(file).

saves the current initial state, macros and the value of parameters in a BIOCHAM file.

export_dot(file).

exports the current BIOCHAM set of rules and initial state into a .dot file. That file can then be used to generate pictures of the interaction map, with for instance tools of the Graphviz suite.

For instance typing the command
dot -Tps file.dot > file.ps
in a terminal, generates a PNG image of the map, like the one shown below (for EXAMPLES/MAPK/mapk.bc):

export_dot(file, list_of_options).

exports the current BIOCHAM set of rules and initial state into a .dot file (see above). The list of options can contain: init_up, to force the molecules present in the initial state to be shown on the top of the image (see example above); mod_double, to show enzymes catalyzing a reaction or modifiers of some kind (i.e. both input and output) with a double arrow (input/output) instead of a simple dashed arrow; col_path, to color in red the latest pathway returned by a CTL query; double_size, to produce A3 sized graphs instead of the default A4.

Examples of invocation:
export_dot(test,[init_up,mod_double,col_path]). to use all of the options, and save the result in the file test.dot;
export_dot(test,[init_up]). to obtain an image like that of the above example (after using dot).

export_nusmv(file).

exports the current BIOCHAM set of rules and initial state in a SMV file. Notations for molecules are translated by replacing the characters _ ( ) } , and ~{ by respectively __ _L _R _r _l _c. Furthermore the names B, F, G, H, O, S, T, V, W, Y, Z are prefixed by _.

export_lotos(file).

exports the current BIOCHAM set of rules and initial state in a LOTOS file. Molecules are translated with the same convention as for SMV.

export_sbml(file).

exports the current BIOCHAM set of rules (including kinetic laws) and initial state to an SBML file. Molecules are translated by removing all '-' and '~{','}'.

load_sbml(file), add_sbml(file).

Act as the corresponding load_biocham/add_biocham commands but importing from an SBML file.

export_ode(file).

exports the current BIOCHAM set of kinetic laws and initial state to an ASCII file in ODE format (humanly readable but also usable with xppaut). Molecules are translated by removing all '-' and '~{','}' and if necessary shortening the resulting name.


About reaction rules:

list_rules.

lists the current set of rules.

list_rules(reaction_pattern).

lists the current set of rules matching the given pattern.

Example: list_rules(? => cycA~?-? + ?) will list all the rules containing any form of cyclin A in the right hand side.

expand_rules.

lists all the instances of the current set of rules (including for each its kinetic law if there is one).

expand_rules(reaction_pattern).

lists all the instances of the current set of rules matching a given pattern.

add_rule(reaction_pattern), add_rule(set_of_reaction_patterns).

adds reaction rules to the current set of rules.

delete_rules(reaction_pattern), delete_rules(set_of_reaction_patterns).

deletes all reaction rules matching one pattern from the current set of rules.

clear_rules.

clears the current set of rules and all molecule and location declarations.

rule(integer), rule(name).

shows the n-th rule (after expansion), or the rule(s) with the corresponding name (i.e. of the form name :...).

pathway(list_of_integers), pathway.

shows the rules of corresponding number (after expansion). If no list is given and the trace corresponding to a query has been generated, then use that trace as list of rules.

show_kinetics.

list all kinetic information (initial concentration and evolution from rules) for each molecule (one line per molecule).

About the molecules contained in the rules:

declare(molecule_declaration).

declares the set of all phosphorylated forms of a molecule.

declarations.

lists all the declarations of phosphorylated forms of molecules.

list_molecules.

lists the molecules contained in all instances of the current set of rules.

list_molecules(object_pattern), list_molecules(set_of_object_patterns).

lists the objects (appearing in the instances of the current set of rules) and matching one of the given object patterns.

check_molecules.

checks lower/upper case errors in molecule names.

About locations and volumes:

localise(localised_molecule).

defines (for the first time) the set of possible locations of a molecule.

extend_localise(localised_molecule).

defines the set of possible locations of a molecule witch have already been localised.

list_locations.

lists all the localised molecules.

volume_ratio(list_of_volume_ratio).

defines the volume ratio . The list_of_volume_ratio is written as couples (integer,location_name), witch are separed with a comma. If no volume ratio is defined for one or several compartments, the default corresponding ratio is set to unity and a warning is signaled.

list_volumes.

lists all the previously defined volume ratio.


About the initial state:

The initial state can be partially defined by giving the list of objects which are present in the initial state and the list of objects which are absent from the initial state. The other objects can be present or absent. When nothing is precised, present objects are given a default concentration of 1.

initial_state.

lists the objects which are present (including their initial concentration) and absent from the initial state.

clear_initial_state.

makes undefined all objects possibly present or absent in the initial state. Also resets all parameters and macros.

present(object_pattern), present(set_of_object_patterns).

all objects (appearing in the instances of the current set of rules) and matching one of the given object patterns are made present in the initial state.

Example: present({cycA, cdk1, cycE~?}).

present(object_pattern,float), present(object_pattern,name).

gives to the given object the given initial concentration, which might be a parameter name (see below).

Example: present(MAPK,0.3).

absent(object_pattern), absent(set_of_object_patterns).

all objects (appearing in the instances of the current set of rules) and matching one of the given object patterns are made absent from the initial state.

undefined(object_pattern), undefined(set_of_object_patterns).

all objects (appearing in the instances of the current set of rules) and matching one of the given object patterns are made possibly present or absent in the initial state.

make_present_not_absent.

all objects (appearing in the instances of the current set of rules) which are not declared absent are made present in the initial state.

make_absent_not_present.

all objects (appearing in the instances of the current set of rules) which are not declared present are made absent in the initial state.

parameter(name,float).

Sets the value of a given parameter to the corresponding value.

parameter(name).

Shows the value of the given parameter.

list_parameters.

Shows the values of all known parameters.

macro(name,simple_kinetics).

Sets the value of a given macro to the corresponding value, which will be re-evaluated each time a kinetic law is to be computed.

macro(name).

Shows the value of the given macro.

list_macros.

Shows the values of all known macros.

About the simulators

For the boolean simulator, the initial state is completely defined as the set of objects declared as present. All the other objects are considered as absent from the initial state. The simulator prints the objects present in the successive states of the simulation. The set of objects which are printed can be specified with patterns. By default all objects are printed.

boolean_simulation, boolean_simulation(integer).

performs a random simulation up to a given number of transitions (default is 100).

boolean_enumeration.

performs a step by step simulation by enumerating all possible behaviors of the system from the initial state. At each step you can either continue the simulation by typing return, backtrack to another transition by typing <, or stop the simulation by typing 'q'. The depth of the current derivation is printed.

numerical_simulation, numerical_simulation(number).

performs a numerical simulation (using one of the methods detailed below) up to a given number of time units (default is 20).

simulation_method, simulation_method(name).

Shows the current simulation method and the corresponding details (error, step_size, etc.). If a parameter is given, sets the desired simulation method. The currently available choices are: rk and stiff. The first one is a fourth-order Runge-Kutta method, with or without step doubling (see below), the second one a Rosenbrock method (implicit), with variable step size (this is the default choice).

step_size, step_size(float).

sets the initial step size (default is 0.01) for the Runge-Kutta method.

step_doubling, step_doubling(float), no_step_doubling.

uses step-doubling method (default) to adapt the step size to a specified maximum error (default is 0.005).
Step adaptation cannot be turned off for Rosenbrock method.

plot.

plots the result of the last simulation, either boolean or numerical.

Example:

export_plot(file_template).

saves the result of the last simulation into two files: file_template.csv and file_template.plot, so that you can produce afterwards the plot of a given simulation by giving to GNUplot the command 'load "file_template.plot"'.

show_molecules(object_pattern), show_molecules(set_of_object_patterns).

adds the molecules matched by the pattern to the molecules printed by the simulator. Initially all molecules are shown.

hide_molecules(object_pattern), hide_molecules(set_of_object_patterns).

removes the molecules matched by the pattern from the molecules printed by the simulator.


show_macros.

will plot macros values too for the numerical simulator.


hide_macros.

will not plot macros values for the numerical simulator.


show_hide.

lists the current patterns for showing and hiding molecules and the state of macros display.


keep_plot.

keeps the current plot window open and asks future plots to use another one (for comparison).


set_color(object, integer), set_color(name, integer).

sets the color used for the corresponding object/macro when plotting. The colors are those available, as shown with the next command: test_plot.


test_plot.

opens a special plot window executing the test command of GNUplot, and thus showing the available colors.


set_xmin(float), set_xmax(float), set_ymin(float), set_ymax(float).

sets the range shown on a numerical plot. One can use the special value * to let the system set the values. These settings are reset (to *) each time a simulation is run.

Note that because BIOCHAM boolean models are highly non-deterministic, random simulation is often impractical and not representative of all possible behaviors. It is therefore much more interesting to query the temporal properties of BIOCHAM models w.r.t. all possible behaviors in CTL logic.

About temporal queries

Formula in Computation Tree Logic (CTL) express the temporal properties of a model w.r.t. all possible runs. BIOCHAM queries are evaluated with a model-checker.

nusmv(biocham_query).

evaluates a temporal query using the model checker NuSMV. The first call to this command compiles the rules and may take a while.

Examples: nusmv(Ei(EF(cdk1))).

why.

explains the result of the last query by producing a witness pathway when this is possible.

nusmv_why(biocham_query).

like nusmv(biocham_query),why. except that if the query is false, the model checker does not compute the query twice.

fairness_path, no_fairness_path.

When evaluating specifications, forces the model checker to consider path quantifiers to apply only to fair paths (resp. all paths). This is especially useful for loop properties.
Warning: the time to compute queries can be bigger.

nusmv_dynamic_reordering, nusmv_disable_dynamic_reordering

Reorders BDD variables dynamically, using NuSMV's group sift converge method. For a better efficiency in building the BDD, this command should be used before the first NuSMV query, even if it still works afterwards.

Some shortcut for remarquable CTL query


check_reachable(query).

Tests for the reachability of the corresponding the state query. This is the same as asking: nusmv(Ei(EF(query))).

check_checkpoint(query1,query2).

Tests if the first state query1 is a necessary step for obtaining the second state query. This is the same as asking: nusmv(Ai(!(E(!(query1) U query2)))).

check_stable(query).

Tests if the state query is stable. This is the same as asking: nusmv(Ai(AG(query))).

check_steady(query).

Tests if the state query is steady. This is the same as asking: nusmv(Ei(EG(query))).

check_loop(query1,query2).

Tests if the system oscillates between the state query1 and the state query2. This is the same as asking: nusmv(Ai(AG((query1-> EF(query2)) & (query2-> EF(query1)))))).

check_oscil(query).

This is the same as loop(query,!(query)), i.e. nusmv(Ai(AG((query -> EF(!(query))) & (!(query)->EF(query))))).


Other model checkers than NuSMV can be used by exporting BIOCHAM rules and initial state in an appropriate SMV file or LOTOS file using the commands export_nusmv or export_lotos.

Learning rules from a CTL specification

Since the rule language and property language have both been formalized, one can use Machine Learning techniques to try and find completions or modifications of a model such that a given set of CTL properties (the specification) representing the expected behavior of the model (as observed for instance by wet lab experiments) is true.
This is the very place where reaction_shortcuts are useful to define the reaction patterns of interest.

add_spec(biocham_query). add_specs(set_of_biocham_queries).

Add some CTL temporal propertie(s) to the list of specifications of the model begavior.

delete_spec(biocham_query). delete_specs(set_of_biocham_queries).
Delete some CTL temporal propertie(s) from the list of specifications of the model begavior.

clear_specs.
Delete the list of CTL specifications of the model begavior.

check.
Test the adequacy of the model w.r.t. its specification.

check_why.
Test the adequacy of the model w.r.t. its specification, and for each CTL property, compute the result of why.

check_all.
Test the adequacy of the model w.r.t. its specification, but only output "ok" if all spec are verified or "not ok" otherwise.

learn_one_rule(reaction_pattern), learn_one_rule(set_of_reaction_patterns).
Tries to find all the single rules coming from the expansion of the given pattern(s) and such that adding them to the model makes it comply with the current specification.

learn_one_deletion(reaction_pattern), learn_one_deletion(set_of_reaction_patterns).
Tries to find all the single rules coming from the expansion of the given pattern(s) and such that deleting them to the model makes it comply with the majority of current specification (positive specifications false stay false).

learn_one_deletion.
Tries to find all the single rules coming from the path of negative false specifications and such that deleting them to the model makes it comply with the majority of current specification (positive specifications false stay false).

Model-checking numerical models with LTL and constraints over reals

For models with kinetic information, it is also possible to query a trace in LTL. LTL formulae have the following syntax:

ltl_query =


condition (*)

| (ltl_query)

| F(query) finally

| G(query) globally

| X(query) next

| (ltl_query)U(ltl_query) until

| !(ltl_query) negation

| ltl_query & ltl_query conjunction

| ltl_query '|' ltl_query disjunction

| ltl_query xor ltl_query exclusive or

| ltl_query -> ltl_query implication

| ltl_query <-> ltl_query equivalence

| oscil(molecule, int) see below

| cross(molecule, molecule, int) see below

(*): For LTL queries, derivatives of concentrations can also appear in the usual conditions with the syntax: d([Molecule])/dt.

The two special queries oscil(M,n) and cross(M1,M2,n) are abbreviations for: n successive repetitions of, in the first case alternance between positive and negative in the sign of d([X])/dt, in the second case crossings between [M1] and [M2].

Example: G(([RAF-RAFK] >= [RAF~{p1}]) U (d([RAF])/dti < 0.3)) expresses that all along the trace, the concentration of the RAF-RAFK complex is greater than that of phosphorylated RAF, until the derivative of the concentration of RAF becomes lower than 0.3.

Example: oscil(cycB,3) expresses the fact that, along the trace, the concentration of cycB goes up and down thrice.

Note that the notion of next state refers to the following state as computed by the (variable step size) simulation, and thus does not necessarily imply real-time neighborhood.
A trace being seen as a linear succession of states, it is possible to use the following command:

trace_check(ltl_query).

evaluates an LTL query on the latest trace produced (if none exists, one will be generated by numerical_simulation).

Note that since only the states effectively computed during the simulation are checked, a true notion of "equality" would not be meaningful (always false), an approximation is thus used but it is an under-approximation (A = B implies that the two values are really equal at some point, maybe between two simulation points, but the converse does not hold).
The notion of next state, using the X operator can also be a little surprising for the same reasons: it relates to the next calculated state.

In the same spirit as what is done for rule learning w.r.t. CTL specifications, one can use the above LTL model-checking techniques to automatize the search for parameter values satisfying some LTL specifications.

trace_get(list_of_name,list_of_pairs_of_floats,float,ltl_query,number).

Tries to find a value for the parameters of given names, between the min and max values given, with the given number of iterations (for each parameter, i.e. if 10 is the given numer 10 to the power of the numer of parameters iterations might happen!), such that the ltl_query representing a specification of the system is true for siulations up to the given time.

Example: trace_get([k],[(0,10)],100,F(([X] > 0.3) & F(d([X])/dt =< 0)),20).
This command will search for a value of parameter 'k' between 0 and 10 such that, before time 20, [X] gets greater than 0.3 and later [X] decreases. The total number of iterations will not be more than 100 to the power of 1, i.e. 100.

Since initial concentrations can be defined as parameters, it is possible to combine kinetic parameter search and initial concentration search.

Some secondary commands are also supplied to get information out of numerical traces.

get_max_from_trace(molecule), get_min_from_trace(molecule).

Prints out the maximal/minimal value of the concentration of the given molecule for the current trace, and the corresponding time value.

set_init_from_trace(float).

Takes the closest calculated time point in the current simulation trace, and sets the initial state according to values at that time point. This can actually change some parameter values if certain molecules have their initial concentration set to a parameter.


Bibliography


[1] Modeling and querying biomolecular interaction networks by Nathalie Chabrier, Marc Chiaverini, Vincent Danos, François Fages and Vincent Schächter.
To appear in Theoretical Computer Science, 2004. Available as pdf.

[2] The Biochemical Abstract Machine BIOCHAM by Nathalie Chabrier, François Fages and Sylvain Soliman.
Proceedings of the Second International Workshop on Computational Methods in Systems Biology CMSB'04, Springer-Verlag, Paris, France, May 2004. Available as ps.gz.

[3] Symbolic model checking of biochemical networks by Nathalie Chabrier and François Fages.
Proceedings of the First International Workshop on Computational Methods in Systems Biology CMSB'03, Springer-Verlag LNCS 2602, pp.149-162, Rovereto, Italy, March 2003. Available as ps.

[4] Symbolic model-checking for biochemical systems: logic programming steps towards formal biology (invited tutorial) by François Fages.
International Conference on Logic Programming ICLP'03, Mumbai, India. December 2003. MIT Press. Slides available as ppt.

[5] Process Calculi and Biology of Molecular Networks. Cooperative Research Action ARC CPBIO. http://contraintes.inria.fr/cpbio/

[6] CMBSlib: a library for comparing formalisms and models of biological systems. by Sylvain Soliman and François Fages.
Proceedings of the Second International Workshop on Computational Methods in Systems Biology CMSB'04, Springer-Verlag, Paris, France, May 2004. Available as ps.gz.