design/architecture.html (281 lines of code) (raw):

<!-- $Id$ --> <html> <head> <title>Xerces 2 | Architecture</title> <link rel='stylesheet' type='text/css' href='css/site.css'> <link rel='stylesheet' type='text/css' href='css/diagram.css'> <style type='text/css'> .note { font-size: smaller } .pipeline { color: black; background: white; border-style: solid; border-color: black; border-width: 1; font-weight: normal } </style> </head> <body> <span class='netscape'> <a name='TOP'></a> <h1>Xerces2 Architecture</h1> <h2>Table of Contents</h2> <p> <ul> <li><a href='#Overview'>Overview</a></li> <li><a href='#DocumentInformation'>Document Information</a></li> <li> <a href='#ParserConfiguration'>Parser Configuration</a> <ul> <li><a href='#Configuration.FeaturesAndProperties'>Features &amp; Properties</a></li> <li><a href='#Configuration.SettingsManagement'>Settings Management</a></li> </ul> </li> </ul> </p> <hr> <a name='Overview'></a> <h2>Overview</h2> <p> The Xerces Native Interface (XNI) is a framework for communicating a "streaming" document information set and constructing generic parser configurations. XNI is part of the Xerces2 development but it is important to note that the Xerces2 parser is just a standards compliant reference implementation of the Xerces Native Interface. Other parsers can be written that conform to XNI without conforming to any particular standards. </p> <a name='DocumentInformation'></a> <h2>Document Information</h2> <p> An XML parser can be viewed as a pipeline in which information flows from a scanner to a validator to the parser. In this pipeline, one component (the scanner) acts as a source of events; the final component (the parser) is the final target of the events; and any components between the source and target are known as filters. Filter components are both targets for the information sent by the previous component in the pipeline and sources for the information that the filter chooses to propagate to the next component in the pipeline. The following diagram illustrates the layout of the pipeline in this kind of parser. </p> <p> <table border='2' cellpadding='10' cellspacing='0'> <tr class='diagram'> <td> <table cellpadding='7' cellspacing='0'> <tr class='diagram'> <td class='diagram'>XML<br>Document</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component'>Scanner</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component'>Validator</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component'>Parser</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='diagram'>Application<br>API</td> </tr> </table> </td> </tr> </table> </p> <p> Parsing of DTDs can also be viewed as a pipeline. Since the DTD is referenced in the document instance by XML syntax (the DOCTYPE declaration), the DTD pipeline is triggered by the document scanner. This contrasts with XML Schema because there is no XML syntax that associates a Schema grammar with a document; a special attribute in the document instance is used as a <em>hint</em> to the location of the grammar. The following diagram illustrates the layout of the DTD pipeline. </p> <p> <table border='2' cellpadding='10' cellspacing='0'> <tr class='diagram'> <td> <table cellpadding='7' cellspacing='0'> <tr class='diagram'> <td class='diagram'>DTD<br>Document</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component'>DTD<br>Scanner</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component' rowspan='3'>Validator</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component'>Parser</td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='diagram'>Application<br>API</td> </tr> <tr><td>&nbsp;</td></tr> <tr class='diagram'> <td></td> <td></td> <td></td> <td></td> <td><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='component'>DTD<br>Grammar</td> </tr> </table> </td> </tr> </table> </p> <p> Note that the DTD scanner communicates directly with the validator. The validator receives the callbacks from the DTD scanner in order to create and populate the DTD grammar object. In this way, the validator acts as a "tee", propogating the DTD events to both the next stage in the pipeline and the DTD grammar object. This allows the validation stage in the pipeline to be completely removed from the parser configuration, if needed. </p> <p> The XML document information is defined by the <code><a href='design.html#XMLDocumentHandler'>XMLDocumentHandler</a></code> interface and the DTD information is defined by the <code><a href='design.html#XMLDTDHandler'>XMLDTDHandler</a></code> and <code><a href='design.html#XMLDTDContentModelHandler'>XMLDTDContentModelHandler</a></code> interfaces. (Note: As of 10 Apr 2001, the DTD interfaces are subject to change based on user feedback.) This set of interfaces and supporting interfaces and classes comprise the XNI Core. However, whereas the XNI Core defines what information document and DTD is communicated but does not define the semantics for configuring the parser pipeline. </p> <a name='ParserConfiguration'></a> <h2>Parser Configuration</h2> <p> In the XNI world, a parser object used by an application is merely an API generator (e.g. building DOM trees or calling SAX handlers). The components and configuration information for that parser is defined within a parser configuration object. With this approach, different parser configurations can be used with the existing parser instances without duplicating code. </p> <p> The parser configuration object, defined by the <code><a href='design.html#XMLParserConfiguration'>XMLParserConfiguration</a></code> interface, that is used by the application is comprised of a series of components. The parser configuration assembles the parsing pipeline components, transmits settings to each component, and controls their actions. The following diagram shows a general parser configuration and its components. (No ordering or direct connection between components should be implied.) </p> <p> <table border='2' cellspacing='0' cellpadding='7'> <tr class='diagram'> <td> <table border='0' cellspacing='5' cellpadding='5'> <tr align='center' valign='middle'> <th class='manager' colspan='9'>Parser Configuration</th> </tr> <tr align='center' valign='middle'> <td class='non-config-component'>Symbol<br>Table</td> <td class='non-config-component'>Grammar<br>Pool</td> <td class='non-config-component'>Datatype<br>Validator<br>Factory</td> <td class='config-component'>Error<br>Reporter</td> <td class='config-component'>Entity<br>Manager</td> <td class='config-component'>Document<br>Scanner</td> <td class='config-component'>DTD<br>Scanner</td> <td class='config-component'>Validator</td> </tr> </table> </td> </tr> </table> </p> <p> The workings of the parser configuration object are unknown to the parser. The parser is only able to set features and properties on the configuration, set the XNI handlers to receive the document information, and initiate a parse. Typically the parser object itself will be registered as the target of XNI events produced from the parser configuration when a document is parsed, but it doesn't have to be. The following diagram illustrates this situation. </p> <p> <table border='2' cellspacing='0' cellpadding='7'> <tr class='diagram'> <td> <table border='0' cellspacing='5' cellpadding='5'> <tr align='center' valign='top'> <th class='parser' colspan='9' rowspan='2'> Parser <table border='0' cellspacing='5' cellpadding='5'> <th class='pipeline'> <em>Parser Configuration Pipeline</em> <table border='0' cellspacing='0' cellpadding='5'> <tr> <td class='config-component'>Scanner</td> <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td> <td class='config-component'>Validator</td> <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td> </tr> </table> </th> </table> </th> <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td> <th class='parser'>DOM<br>Parser</th> </tr> <tr align='center' valign='top'> <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td> <th class='parser'>SAX<br>Parser</th> </tr> </table> </td> </tr> </table> <a name='Configuration.FeaturesAndProperties'></a> <h3>Features &amp; Properties</h3> <p> Features and properties are provided via the extensible mechanism found in SAX2. Features are boolean settings on the parser configuration while properties are object settings. There are a number of SAX2 core features and properties but XNI parser components are free to define new ones. All of the features and properties are managed by the parser configuration, though. </p> <p> <em>TODO:</em> Expand on how features and properties are set, when, and by who. </p> <a name='Configuration.SettingsManagement'></a> <h3>Settings Management</h3> <p> The parser configuration implements the <code><a href='design.html#XMLComponentManager'>XMLComponentManager</a></code> interface and each component implements the <code><a href='design.html#XMLComponent'>XMLComponent</a></code> interface. For this configuration system to work, the parser configuration must adhere to the following guidelines: <span class='netscape'> <ul> <li> Before each parse, the parser configuration <strong>must</strong> call the <code>reset</code> method on each configurable component. This call allows each component to query the state of only those features and properties that are important to the operation of the component. </li> <li> Any time that the application sets a feature or property on the parser <em>during a parse</em>, the parser configuration <strong>must</strong> pass those settings to each configurable component. This is important because configuration settings can change while parsing an XML document and those settings may directly affect the operation of components. But this does <em>not</em> need to be done before or after a parse because each component will query settings during the call to <code>reset</code>. </li> </ul> </span> </p> </span> <a name='BOTTOM'></a> <hr> <span class='netscape'> Author: Andy Clark <br> Last modified: $Date$ </span> </body> </html>