HomeHome

The Qt SAX2 implementation


Introduction to SAX2

The SAX2 interface is an event-driven mechanism to provide the user with document information. "Event" in this context has nothing to do with the term "event" you probably know from windowing systems; it means that the parser reports certain document information while parsing the document. These reported information is referred to as "event".

To make it less abstract consider the following example:

<quote>To make it less abstract consider the following example:</quote>

Whilst reading (a SAX2 parser is usually referred to as "reader") the above document three events would be triggered:

  1. A start tag occurs (<quote>).
  2. Character data (i.e. text) is found.
  3. An end tag is parsed (</quote>).

Each time such an event occurs the parser reports it so that a suitable event handling routine can be invoked.

Whilst this is a fast and simple approach to read XML documents manipulation is difficult because data are not stored, simply handled and discarded serially. This is when the DOM interface comes handy.

The Qt XML module provides an abstract class, QXmlReader, that defines the interface for potential SAX2 readers. At the moment Qt ships with one reader implementation, QXmlSimpleReader.

The reader reports parsing events through special handler classes. In Qt the following ones are available:

These classes are abstract classes describing the interface. The QXmlDefaultHandler class provides a "do nothing" default implementation for all of them. Therefore users need to overload only the QXmlDefaultHandler functions they are interested in.

To read input XML data a special class QXmlInputSource is used.

Apart from the already mentioned ones the following SAX2 support classes provide the user with useful functionality:

Features

The behaviour of an XML reader depends on whether it supports certain optional features or not. As an example a reader can have the feature "report attributes used for namespace declarations and prefixes along with the local name of a tag". Like every other feature this has a unique name represented by a URI: it is called http://xml.org/sax/features/namespace-prefixes.

The Qt SAX2 implementation allows you to find out whether the reader has this ability using QXmlReader::hasFeature(). If the return value is TRUE it is possible to turn the relevant feature on and off. To do this use QXmlReader::setFeature(). Whether a supported feature is on or off (TRUE or FALSE) can be queried using QXmlReader::feature().

Consider the example

<document xmlns:book = 'http://trolltech.com/fnord/book/'
          xmlns      = 'http://trolltech.com/fnord/' >

A reader not supporting the http://xml.org/sax/features/namespace-prefixes feature would clearly report the element name document but not its attributes xmlns:book and xmlns with their values. A reader with the feature http://xml.org/sax/features/namespace-prefixes reports the namespace attributes if QXmlReader::feature() is TRUE and disregards them if the feature is FALSE.

Other features include http://xml.org/sax/features/namespace (namespace processing, implies http://xml.org/sax/features/namespace-prefixes) or http://xml.org/sax/features/validation (the ability to report validation errors).

Whilst SAX2 leaves it to the user to define and implement whatever features are required, support for http://xml.org/sax/features/namespace (and thus http://xml.org/sax/features/namespace-prefixes) is mandantory. Accordingly QXmlSimpleReader, the implementation of QXmlReader that comes with the Qt XML module, supports both of them, and therefore can do namespace processing.

Being a non-validating parser QXmlSimpleReader does not support http://xml.org/sax/features/validation and other features.

Namespace support via features

As we have seen in the previous section we can configure the behavior of the reader when it comes to namespace processing. This is done by setting and unsetting the http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes features.

They influence the reporting behavior in the following way:

  1. Namespace prefixes and local parts of elements and attributes can be reported.
  2. The qualified names of elements and attributes are reported.
  3. QXmlContentHandler::startPrefixMapping() and QXmlContentHandler::endPrefixMapping() are called by the reader.
  4. Attributes that declare namespaces (i.e. the attribute xmlns and attributes starting with xmlns: ) are reported.

Consider the following element:

<author xmlns:fnord = 'http://trolltech.com/fnord/'
             title="Ms" 
             fnord:title="Goddess" 
             name="Eris Kallisti"/>

With http://xml.org/sax/features/namespace-prefixes set to TRUE the reader will report four attributes, with the namespace-prefixes feature set to FALSE only three: The xmlns:fnord attribute defining a namespace is then "unvisible" for the reader.

The http://xml.org/sax/features/namespaces feature on the other hand is responsible for reporting local names, namespace prefixes and -URIs. With http://xml.org/sax/features/namespaces set to TRUE the parser will report title as the local name of fnord:title attribute, fnord being the namespace prefix and http://trolltech.com/fnord/ as the namespace URI. When http://xml.org/sax/features/namespaces is FALSE none of them are reported.

In the current implementation the Qt XML classes follow the definition that the prefix xmlns itself isn't associated with any namespace at all (see http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using). Therefore even with http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes both set to TRUE the reader won't return either a local name, a namespace prefix or a namespace URI for xmlns:fnord.

This might be changed in the future following the W3C suggestion http://www.w3.org/2000/xmlns/ to associate xmlns with the namespace http://www.w3.org/2000/xmlns.

As the SAX2 standard suggests QXmlSimpleReader by default has http://xml.org/sax/features/namespaces set to TRUE and http://xml.org/sax/features/namespace-prefixes set to FALSE. When changing this behavior using QXmlSimpleReader::setFeature() note that the combination of both features set to FALSE is illegal.

For a practical demonstration of how the two features affect the output of the reader run the tagreader with features example.

Summary

QXmlSimpleReader implements the following behavior (the value in parentheses denotes to the SAX2 requirements if they differ from the Qt implementation:
namespaces namespace-prefixes Namespace prefix and local part Qualified names Prefix mapping xmlns attributes
TRUE FALSE Yes Yes (Unknown) Yes No
TRUE TRUE Yes Yes Yes Yes
FALSE TRUE No (Unknown) Yes No (Unknown) Yes
FALSE FALSE (Illegal)

Properties

Properties are a more general concept. They also have a unique name, represented as an URI, but their value is void*. Thus nearly everything can be used as a property value. This concept involves some danger, though: there are no means to ensure type-safety; the user must take care that he or she passes the correct type. Properties are useful if a reader supports special handler classes.

The URIs used for features and properties often look like URLs, e.g. http://xml.org/sax/features/namespace. This does not mean that whatsoever data is required at this address. It is simply a way to define unique names.

Everybody can define and use new SAX2 properties for his or her readers. Property support is however not required.

To set or query properties the following functions are provided: QXmlReader::setProperty(), QXmlReader::property() and QXmlReader::hasProperty().

Further reading

For a practical example on how to use the Qt SAX2 classes see the tagreader walkthrough.

More information about XML (e.g. namespace) can be found in the introduction to the Qt XML module.


Copyright © 2000 TrolltechTrademarks
Qt version 2.3.1