Go to the first, previous, next, last section, table of contents.


The Pxp_document module

Pxp_document:

Object model of the document/element instances

====================================================================== OVERVIEW

class type node ............. The common class type of the nodes of the element tree. Nodes are either elements (inner nodes) or data nodes (leaves) class type extension ........ The minimal properties of the so-called extensions of the nodes: Nodes can be customized by applying a class parameter that adds methods/values to nodes. class data_impl : node ...... Implements data nodes. class element_impl : node ... Implements element nodes class document .............. A document is an element with some additional properties

======================================================================

THE STRUCTURE OF NODE TREES:

Every node except the root node has a parent node. The parent node is always an element, because data nodes never contain other nodes. In the other direction, element nodes may have children; both elements and data nodes are possible as children. Every node knows its parent (if any) and all its children (if any); the linkage is maintained in both directions. A node without a parent is called a root. It is not possible that a node is the child of two nodes (two different nodes or a multiple child of the same node). You can break the connection between a node and its parent; the method "delete" performs this operations and deletes the node from the parent's list of children. The node is now a root, for itself and for all subordinate nodes. In this context, the node is also called an orphan, because it has lost its parent (this is a bit misleading because the parent is not always the creator of a node). In order to simplify complex operations, you can also set the list of children of an element. Nodes that have been children before are unchanged; new nodes are added (and the linkage is set up), nodes no more occurring in the list are handled if they have been deleted. If you try to add a node that is not a root (either by an "add" or by a "set" operation) the operation fails.

CREATION OF NODES

The class interface supports creation of nodes by cloning a so-called exemplar. The idea is that it is sometimes useful to implement different element types by different classes, and to implement this by looking up exemplars. Imagine you have three element types A, B, and C, and three classes a, b, and c implementing the node interface (for example, by providing different extensions, see below). The XML parser can be configured to have a lookup table { A --> a0, B --> b0, C --> c0 } where a0, b0, c0 are exemplars of the classes a, b, and c, i.e. empty objects belonging to these classes. If the parser finds an instance of A, it looks up the exemplar a0 of A and clones it (actually, the method "create_element" performs this for elements, and "create_data" for data nodes). Clones belong to the same class as the original nodes, so the instances of the elements have the same classes as the configured exemplars. Note: This technique assumes that the interface of all exemplars is the same!

THE EXTENSION

The class type node and all its implementations have a class parameter 'ext which must at least fulfil the properties of the class type "extension". The idea is that you can add properties, for example:

class my_extension = object (* minimal properties required by class type "extension": *) method clone = ... method node = ... method set_node n = ... (* here my own methods: *) method do_this_and_that ... end

class my_element_impl = my_extension element_impl class my_data_impl = my_extension data_impl

The whole XML parser is parameterized with 'ext, so your extension is visible everywhere (this is the reason why extensibility is solved by parametric polymorphism and not by inclusive polymorphism (subtyping)).

SOME COMPLICATED TYPE EXPRESSIONS

Sometimes the following type expressions turn out to be necessary:

'a node extension as 'a This is the type of an extension that belongs to a node that has an extension that is the same as we started with.

'a extension node as 'a This is the type of a node that has an extension that belongs to a node of the type we started with.

DOCUMENTS ...

======================================================================

SIMPLE USAGE: ...

open Pxp_dtd
type node_type =

The basic and most important node types: - T_element element_type is the type of element nodes - T_data is the type of text data nodes By design of the parser, neither CDATA sections nor entity references are represented in the node tree; so there are no types for them.

    T_element of string
  | T_data

The following types are extensions to my original design. They have mainly been added to simplify the implementation of standards (such as XPath) that require that nodes of these types are included into the main document tree. There are options (see Pxp_yacc) forcing the parser to insert such nodes; in this case, the nodes are actually element nodes serving as wrappers for the additional data structures. The options are: enable_super_root_node, enable_pinstr_nodes, enable_comment_nodes. By default, such nodes are not created.

  | T_super_root
  | T_pinstr of string                  (* The string is the target of the PI *)
  | T_comment

The following types are fully virtual. This means that it is impossible to make the parser insert such nodes. However, these types might be practical when defining views on the tree. Note that the list of virtual node types will be extended if necessary.

  | T_none
  | T_attribute of string          (* The string is the name of the attribute *)
  | T_namespace of string               (* The string is the namespace prefix *)
;;

class
 type [ 'node ] extension =
  object ('self)
    method clone : 'self

"clone" should return an exact deep copy of the object.

 : 'node

"node" returns the corresponding node of this extension. This method intended to return exactly what previously has been set by "set_node".

 : 'node -> unit

"set_node" is invoked once the extension is associated to a new node object.

  end
;;

class
 type [ 'ext ] node =
  object ('self)
    constraint 'ext = 'ext node #extension

    method extension : 'ext

Return the extension of this node:

 : unit

Delete this node from the parent's list of sub nodes. This node gets orphaned. 'delete' does nothing if this node does not have a parent.

 : 'ext node

Get the parent, or raise Not_found if this node is an orphan.

 : 'ext node

Get the direct or indirect parent that does not have a parent itself, i.e. the root of the tree.

 : 'self

return an exact clone of this element and all sub nodes (deep copy) except string values which are shared by this node and the clone. The other exception is that the clone has no parent (i.e. it is now a root).

 : 'self

return a clone of this element where all subnodes are omitted. The type of the node, and the attributes are the same as in the original node. The clone has no parent.

 : ?force:bool -> 'ext node -> unit

Append new sub nodes -- mainly used by the parser itself, but of course open for everybody. If an element is added, it must be an orphan (i.e. does not have a parent node); and after addition this* node is the new parent. The method performs some basic validation checks if the current node has a regular expression as content model, or is EMPTY. You can turn these checks off by passing ~force:true to the method.

 : proc_instruction -> unit

Add a processing instruction to the set of processing instructions of this node. Usually only elements contain processing instructions.

 : string -> proc_instruction list

Get all processing instructions with the passed name

 : string list

Get a list of all names of processing instructions

 : int

Returns the position of this node among all children of the parent node. Positions are counted from 0. Raises Not_found if the node is the root node.

 : int list

Returns the list of node positions of the ancestors of this node, including this node. The first list element is the node position of this child of the root, and the last list element is the node position of this node. Returns if the node is the root node.

 : 'ext node list

Get the list of sub nodes

 : ('ext node -> unit) -> unit

iterate over the sub nodes

 :
      ('ext node option -> 'ext node -> 'ext node option -> unit) -> unit

Here every iteration step can also access to the previous and to the following node if present.

 : int -> 'ext node

Returns the n-th sub node of this node, n >= 0. Raises Not_found if the index is out of the valid range. Note that the first invocation of this method requires additional overhead.

 : 'ext node
    method next_node : 'ext node

Return the previous and next nodes, respectively. These methods are equivalent to - parent # nth_node (self # node_position - 1) and - parent # nth_node (self # node_position + 1), respectively.

 : 'ext node list -> unit

Set the list of sub nodes. Elements that are no longer sub nodes gets orphaned, and all new elements that previously were not sub nodes must have been orphaned.

 : string

Get the data string of this node. For data nodes, this string is just the content. For elements, this string is the concatenation of all subordinate data nodes.

 : node_type

Get the name of the element type.

 : (string * int * int)

Return the name of the entity, the line number, and the column position (byte offset) of the beginning of the element. Only available if the element has been created with position information. Returns "?",0,0 if not available. (Note: Line number 0 is not possible otherwise.)

 : string -> Pxp_types.att_value
    method attribute_names : string list
    method attribute_type : string -> Pxp_types.att_type
    method attributes : (string * Pxp_types.att_value) list

Get a specific attribute; get the names of all attributes; get the type of a specific attribute; get names and values of all attributes. Only elements have attributes. Note: If the DTD allows arbitrary for this element, "attribute_type" raises Undeclared.

 : string -> string
    method required_list_attribute : string -> string list

Return the attribute or fail if the attribute is not present: The first version passes the value always as string back; the second version always as list.

 : string -> string option
    method optional_list_attribute : string -> string list

Return some attribute value or return None if the attribute is not present: The first version passes the value always as string back; the second version always as list.

 : string
    method id_attribute_value : string

Return the name and value of the ID attribute. The methods may raise Not_found if there is no ID attribute in the DTD, or no ID attribute in the element, respectively.

 : string list

Returns the list of attribute names of IDREF or IDREFS type.

 : (string * Pxp_types.att_value) list -> unit

Sets the attributes but does not check whether they match the DTD.

 : 'ext node list

Experimental feature: Return the attributes as node list. Every node has type T_attribute n, and contains only the single attribute n. This node list is computed on demand, so the first invocation of this method will create the list, and following invocations will only return the existing list.

 : string option -> unit

Sets the comment string; only applicable for T_comment nodes

 : string option

Get the comment string. Returns always None for nodes with a type other than T_comment.

 : dtd

Get the DTD. Fails if no DTD is specified (which is impossible if 'create_element' or 'create_data' have been used to create this object)

 : Pxp_types.rep_encoding

Get the encoding which is always the same as the encoding of the DTD. See also method 'dtd' (Note: This method fails, too, if no DTD is present.)

 : 
             ?position:(string * int * int) ->
             dtd -> node_type -> (string * string) list -> 'ext node

create an "empty copy" of this element: - new DTD - new node type (which must not be T_data) - new attribute list - empty list of nodes

 : dtd -> string -> 'ext node

create an "empty copy" of this data node:

 : 
             ?use_dfa:bool ->
             unit -> unit

Check that this element conforms to the DTD. Option ~use_dfa: If true, the deterministic finite automaton of regexp content models is used for validation, if available. Defaults to false.

 : unit

Normally, add_node does not accept data nodes when the DTD does not allow data nodes or only whitespace ("ignorable whitespace"). Once you have invoked this method, ignorable whitespace is forced to be included into the document.

 : Pxp_types.output_stream -> Pxp_types.encoding -> unit

Write the contents of this node and the subtrees to the passed output stream; the passed encoding is used. The format is compact (the opposite of "pretty printing").

 : Pxp_types.output_stream -> unit

DEPRECATED METHOD; included only to keep compatibility with older versions of the parser

----------------------------------------

The methods 'find' and 'reset_finder' are no longer supported. The functionality is provided by the configurable index object (see Pxp_yacc).

----------------------------------------

internal methods:

 : 'ext node option -> int -> unit
    method internal_set_pos : int -> unit
    method internal_delete : 'ext node -> unit
    method internal_init : (string * int * int) ->
                           dtd -> string -> (string * string) list -> unit
    method internal_init_other : (string * int * int) ->
                                 dtd -> node_type -> unit
  end
;;

class [ 'ext ] data_impl : 'ext -> [ 'ext ] node

Creation: new data_impl an_extension creates a new data node with the given extension and the empty string as content.

;;

class [ 'ext ] element_impl : 'ext -> [ 'ext ] node

Creation: new element_impl an_extension creates a new empty element node with the given extension.

;;

Attribute and namespace nodes are experimental:

 [ 'ext ] attribute_impl : 
  element:string -> name:string -> Pxp_types.att_value -> dtd -> [ 'ext ] node

Creation: new attribute_impl element_name attribute_name attribute_value dtd Note that attribute nodes do intentionally not have extensions.

Once namespaces get implemented: class 'ext namespace_impl : prefix:string -> name:string -> dtd -> 'ext node

spec

type 'ext spec
constraint 'ext = 'ext node #extension

Contains the exemplars used for the creation of new nodes

val make_spec_from_mapping :
      ?super_root_exemplar : 'ext node ->
      ?comment_exemplar : 'ext node ->
      ?default_pinstr_exemplar : 'ext node ->
      ?pinstr_mapping : (string, 'ext node) Hashtbl.t ->
      data_exemplar: 'ext node ->
      default_element_exemplar: 'ext node ->
      element_mapping: (string, 'ext node) Hashtbl.t -> 
      unit -> 
        'ext spec

Specifies: - For new data nodes, the ~data_exemplar must be used - For new element nodes: If the element type is mentioned in the ~element_mapping hash table, the exemplar found in this table is used. Otherwise, the ~default_element_exemplar is used. Optionally: - You may also specify exemplars for super root nodes, for comments and for processing instructions

val make_spec_from_alist :
      ?super_root_exemplar : 'ext node ->
      ?comment_exemplar : 'ext node ->
      ?default_pinstr_exemplar : 'ext node ->
      ?pinstr_alist : (string * 'ext node) list ->
      data_exemplar: 'ext node ->
      default_element_exemplar: 'ext node ->
      element_alist: (string * 'ext node) list -> 
      unit -> 
        'ext spec

This is a convenience function: You can pass the mappings from elements and PIs to exemplar by associative lists.

val create_data_node : 
      'ext spec -> dtd -> string -> 'ext node
val create_element_node : 
      ?position:(string * int * int) ->
      'ext spec -> dtd -> string -> (string * string) list -> 'ext node
val create_super_root_node :
      ?position:(string * int * int) ->
      'ext spec -> dtd -> 'ext node
val create_comment_node :
      ?position:(string * int * int) ->
      'ext spec -> dtd -> string -> 'ext node
val create_pinstr_node :
      ?position:(string * int * int) ->
      'ext spec -> dtd -> proc_instruction -> 'ext node

These functions use the exemplars contained in a spec and create fresh node objects from them.

val create_no_node : 
       ?position:(string * int * int) -> 'ext spec -> dtd -> 'ext node

Creates a T_none node with limited functionality

Ordering of nodes

val compare : 'ext node -> 'ext node -> int

Returns -1 if the first node is before the second node, or +1 if the first node is after the second node, or 0 if both nodes are identical. If the nodes are unrelated (do not have a common ancestor), the result is undefined. This test is rather slow.

type 'ext ord_index
constraint 'ext = 'ext node #extension

The type of ordinal indexes

val create_ord_index : 'ext node -> 'ext ord_index

Creates an ordinal index for the subtree starting at the passed node. This index assigns to every node an ordinal number (beginning with 0) such that nodes are numbered upon the order of the first character in the XML representation (document order). Note that the index is not automatically updated when the tree is modified.

val ord_number : 'ext ord_index -> 'ext node -> int

Returns the ordinal number of the node, or raises Not_found

val ord_compare : 'ext ord_index -> 'ext node -> 'ext node -> int

Compares two nodes like 'compare': Returns -1 if the first node is before the second node, or +1 if the first node is after the second node, or 0 if both nodes are identical. If one of the nodes does not occur in the ordinal index, Not_found is raised. This test is much faster than 'compare'.

Iterators

val find : ?deeply:bool -> 
           f:('ext node -> bool) -> 'ext node -> 'ext node

Searches the first node for which the predicate f is true, and returns it. Raises Not_found if there is no such node. By default, ~deeply=false. In this case, only the children of the passed node are searched. If passing ~deeply=true, the children are searched recursively (depth-first search).

val find_all : ?deeply:bool ->
               f:('ext node -> bool) -> 'ext node -> 'ext node list

Searches all nodes for which the predicate f is true, and returns them. By default, ~deeply=false. In this case, only the children of the passed node are searched. If passing ~deeply=true, the children are searched recursively (depth-first search).

val find_element : ?deeply:bool ->
                   string -> 'ext node -> 'ext node

Searches the first element with the passed element type. By default, ~deeply=false. In this case, only the children of the passed node are searched. If passing ~deeply=true, the children are searched recursively (depth-first search).

val find_all_elements : ?deeply:bool ->
                        string -> 'ext node -> 'ext node list

Searches all elements with the passed element type. By default, ~deeply=false. In this case, only the children of the passed node are searched. If passing ~deeply=true, the children are searched recursively (depth-first search).

exception Skip
val map_tree :  pre:('exta node -> 'extb node) ->
               ?post:('extb node -> 'extb node) ->
               'exta node -> 
                   'extb node

Traverses the passed node and all children recursively. After entering a node, the function ~pre is called. The result of this function must be a new node; it must not have children nor a parent (you can simply pass (fun n -> n # orphaned_flat_clone) as ~pre). After that, the children are processed in the same way (from left to right); the results of the transformation will be added to the new node as new children. Now, the ~post function is invoked with this node as argument, and the result is the result of the function (~post should return a root node, too; if not specified, the identity is the ~post function). Both ~pre and ~post may raise Skip, which causes that the node is left out. If the top node is skipped, the exception Not_found is raised.

val map_tree_sibl : 
        pre: ('exta node option -> 'exta node -> 'exta node option -> 
                  'extb node) ->
       ?post:('extb node option -> 'extb node -> 'extb node option -> 
                  'extb node) ->
       'exta node -> 
           'extb node

Works like map_tree, but the function ~pre and ~post have additional arguments: - ~pre l n r: The node n is the node to map, and l is the previous node, and r is the next node (both None if not present). l and r are both nodes before the transformation. - ~post l n r: The node n is the node which is the result of ~pre plus adding children. l and r are again the previous and the next node, respectively, but after being transformed.

val iter_tree : ?pre:('ext node -> unit) ->
                ?post:('ext node -> unit) ->
                'ext node -> 
                    unit

Iterates only instead of mapping the nodes.

val iter_tree_sibl :
       ?pre: ('ext node option -> 'ext node -> 'ext node option -> unit) ->
       ?post:('ext node option -> 'ext node -> 'ext node option -> unit) ->
       'ext node -> 
           unit

Iterates only instead of mapping the nodes.

document

 [ 'ext ] document :
  Pxp_types.collect_warnings -> 
  object

Documents: These are containers for root elements and for DTDs.

Important invariant: A document is either empty (no root element, no DTD), or it has both a root element and a DTD.

A fresh document created by 'new' is empty.

 : string -> unit

Set the XML version string of the XML declaration.

 : 'ext node -> unit

Set the root element. It is expected that the root element has a DTD. Note that 'init_root' checks whether the passed root element has the type expected by the DTD. The check takes into account that the root element might be a virtual root node.

 : string

Returns the XML version from the XML declaration. Returns "1.0" if the declaration is missing.

 : bool

Returns whether this document is declared as being standalone. This method returns the same value as 'standalone_declaration' of the DTD (if there is a DTD). Returns 'false' if there is no DTD.

 : dtd

Returns the DTD of the root element. Fails if there is no root element.

 : Pxp_types.rep_encoding

Returns the string encoding of the document = the encoding of the root element = the encoding of the element tree = the encoding of the DTD. Fails if there is no root element.

 : 'ext node

Returns the root element, or fails if there is not any.

 : proc_instruction -> unit

Adds a processing instruction to the document container. The parser does this for PIs occurring outside the DTD and outside the root element.

 : string -> proc_instruction list

Return all PIs for a passed target string.

 : string list

Return all target strings of all PIs.

 : Pxp_types.output_stream -> Pxp_types.encoding -> unit

Write the document to the passed output stream; the passed encoding used. The format is compact (the opposite of "pretty printing"). If a DTD is present, the DTD is included into the internal subset.

 : Pxp_types.output_stream -> unit

DEPRECATED METHOD; included only to keep compatibility with older versions of the parser

  end
;;


Go to the first, previous, next, last section, table of contents.